Usage
divide(data, by = NULL, spill = 1000000, filterFn = NULL, bsvFn = NULL,
output = NULL, overwrite = FALSE, preTransFn = NULL,
postTransFn = NULL, params = NULL, packages = NULL, control = NULL,
update = FALSE, verbose = TRUE)
Arguments
data
an object of class "ddf" or "ddo" - in the latter case, need to specify preTransFn
to coerce each subset into a data frame
by
specification of how to divide the data - conditional (factor-level or shingles), random replicate, or near-exact replicate (to come) -- see details
spill
integer telling the division method how many lines of data should be collected until spilling over into a new key-value pair
filterFn
a function that is applied to each candidate output key-value pair to determine whether it should be (if returns TRUE
) part of the resulting division
bsvFn
a function to be applied to each subset that returns a list of between subset variables (BSVs)
output
a "kvConnection" object indicating where the output data should reside (see localDiskConn
, hdfsConn
). If NULL
(default), output will be overwrite
logical; should existing output location be overwritten? (also can specify overwrite = "backup"
to move the existing output to _bak)
preTransFn
a transformation function (if desired) to applied to each subset prior to division - note: this is deprecated - instead use addTransform
prior to calling divide postTransFn
a transformation function (if desired) to apply to each post-division subset
params
a named list of objects external to the input data that are needed in the distributed computing (most should be taken care of automatically such that this is rarely necessary to specify)
packages
a vector of R package names that contain functions used in fn
(most should be taken care of automatically such that this is rarely necessary to specify)
control
parameters specifying how the backend should handle things (most-likely parameters to rhwatch
in RHIPE) - see rhipeControl
and localDiskContr
update
should a MapReduce job be run to obtain additional attributes for the result data prior to returning?
verbose
logical - print messages about what is being done