datadr (version 0.8.4)

drAggregate: Division-Agnostic Aggregation

Description

Aggregates data by cross-classifying factors, with a formula interface similar to xtabs

Usage

drAggregate(data, formula, by = NULL, output = NULL, preTransFn = NULL,
  maxUnique = NULL, params = NULL, packages = NULL, control = NULL)

Arguments

data
a "ddf" containing the variables in the formula formula
formula
a formula object with the cross-classifying variables (separated by +) on the right hand side (or an object which can be coerced to a formula). Interactions are not allowed. On the left hand side, one may o
by
an optional variable name or vector of variable names by which to split up tabulations (i.e. tabulate independently inside of each unique "by" variable value). The only difference between specifying "by" and placing the variable(s) in the right hand side
output
"kvConnection" object indicating where the output data should reside in the case of by being specified (see localDiskConn, hdfsConn). If
preTransFn
an optional function to apply to each subset prior to performing tabulation. The output from this function should be a data frame containing variables with names that match that of the formula provided. Note: this is deprecated - instead use
maxUnique
the maximum number of unique combinations of variables to obtain tabulations for. This is meant to help against cases where a variable in the formula has a very large number of levels, to the point that it is not meaningful to tabulate and is too computa
params
a named list of objects external to the input data that are needed in the distributed computing (most should be taken care of automatically such that this is rarely necessary to specify)
packages
a vector of R package names that contain functions used in fn (most should be taken care of automatically such that this is rarely necessary to specify)
control
parameters specifying how the backend should handle things (most-likely parameters to rhwatch in RHIPE) - see rhipeControl and localDiskContr

Value

  • a data frame of the tabulations. When "by" is specified, it is a ddf with each key-value pair corresponding to a unique "by" value, containing a data frame of tabulations.

See Also

xtabs, updateAttributes

Examples

Run this code
drAggregate(Sepal.Length ~ Species, data = ddf(iris))

Run the code above in your browser using DataCamp Workspace