datadr (version 0.8.6.1)

mr-summary-stats: Functions to Compute Summary Statistics in MapReduce

Description

Functions that are used to tabulate categorical variables and compute moments for numeric variables inside through the MapReduce framework. Used in updateAttributes.

Usage

tabulateMap(formula, data)

tabulateReduce(result, reduce.values, maxUnique = NULL)

calculateMoments(y, order = 1, na.rm = TRUE)

combineMoments(m1, m2)

combineMultipleMoments(...)

moments2statistics(m)

Arguments

formula

a formula to be used in xtabs

data

a subset of a 'ddf' object

result, reduce.values

inconsequential tabulateReduce parameters

maxUnique

the maximum number of unique combinations of variables to obtaion tabulations for. This is meant to help against cases where a variable in the formula has a very large number of levels, to the point that it is not meaningful to tabulate and is too computationally burdonsome. If NULL, it is ignored. If a positive number, only the top and bottom maxUnique tabulations by frequency are kept.

y, order, na.rm

inconsequential calculateMoments parameters

m1, m2

inconsequential combineMoments parameters

m

inconsequential moments2statistics parameters

inconsequential parameters

Examples

Run this code
# NOT RUN {
d <- divide(iris, by = "Species", update = TRUE)
summary(d)
# }

Run the code above in your browser using DataLab