aggregateByKey

Matrix, data.frame or data.table (with only numeric, integer, factor, logical, character columns).

dataSet

The name of a column of dataSet according to which the set should be aggregated (character)

Should the algorithm talk? (logical, default to TRUE)

verbose

number of max values for frequencie count (numerical, default to 53)

thresh

optional argument: <code>functions</code>: aggregation functions for numeric columns (list of functions)
(vector of function, optional, if not set we use: c(mean, min, max, sd))

Automatic aggregation of a dataSet set according to a <code>key</code>

Do most of the painful data preparation for a data science project with a minimum amount of code; Take advantages of data.table efficiency and use some algorithmic trick in order to perform data preparation in a time and RAM efficient way.

aggregateByKey: Automatic dataSet aggregation by key

Description

Usage

Arguments

Value

Details

Examples