datadr (version 0.8.4)

mr-summary-stats: Functions to Compute Summary Statistics in MapReduce

Description

Functions that are used to tabulate categorical variables and compute moments for numeric variables inside through the MapReduce framework. Used in updateAttributes.

Usage

tabulateMap(formula, data)

tabulateReduce(result, reduce.values, maxUnique = NULL)

calculateMoments(y, order = 1, na.rm = TRUE)

combineMoments(m1, m2)

combineMultipleMoments(...)

moments2statistics(m)

Arguments

formula
a formula to be used in xtabs
data
a subset of a 'ddf' object
result, reduce.values
inconsequential tabulateReduce parameters
maxUnique
the maximum number of unique combinations of variables to obtaion tabulations for. This is meant to help against cases where a variable in the formula has a very large number of levels, to the point that it is not meaningful to tabulate and is too comput
y, order, na.rm
inconsequential calculateMoments parameters
m1, m2
inconsequential combineMoments parameters
m
inconsequential moments2statistics parameters
...
inconsequential parameters

Examples

Run this code
d <- divide(iris, by = "Species", update = TRUE)
summary(d)

Run the code above in your browser using DataLab