Learn R Programming

dataPreparation (version 0.1)

aggregateByKey: Automatic dataSet aggregation by key

Description

Automatic aggregation of a dataSet set according to a key

Usage

aggregateByKey(dataSet, key, verbose = TRUE, thresh = 53, ...)

Arguments

dataSet

Matrix, data.frame or data.table (with only numeric, integer, factor, logical, character columns).

key

The name of a column of dataSet according to which the set should be aggregated (character)

verbose

Should the algorithm talk? (logical, default to TRUE)

thresh

number of max values for frequencie count (numerical, default to 53)

...

optional argument: functions: aggregation functions for numeric columns (list of functions) (vector of function, optional, if not set we use: c(mean, min, max, sd))

Value

A data.table with one line per key elements and mulitple new columns.

Details

Perform aggregation depending on column type:

  • If column is numeric functions are performed on the column. So 1 numeric column give length(functions) new columns

  • If column is character or factor and have less than thresh different values, frequencie count of values is performed

  • If column is character or factor with more than thresh different values, number of different values for each key is performed

  • If columbn is logical, count of number and rate of positive is performed.

Be carefull using functions agrument, the function given should be an aggregation fuction, meaning that for multiple values it should only return one value.

Examples

Run this code
# NOT RUN {
# Get generic dataset from R
data("adult")

# Aggregate it using aggregateByKey, in order to extract characteristics for each country
adult_aggregated <- aggregateByKey(adult, key = 'country')

# Exmple with other functions
power <- function(x){sum(x^2)}
adult_aggregated <- aggregateByKey(adult, key = 'country', functions = c(power, sqrt))

# sqrt is not an aggregation function, so it wasn't used.
# }

Run the code above in your browser using DataLab