datadr (version 0.8.4)

drJoin: Join Data Sources by Key

Description

Outer join of two or more distributed data object (DDO) sources by key

Usage

drJoin(..., output = NULL, overwrite = FALSE, postTransFn = NULL,
  params = NULL, packages = NULL, control = NULL)

Arguments

output
a "kvConnection" object indicating where the output data should reside (see localDiskConn, hdfsConn). If NULL (default), output will be
overwrite
logical; should existing output location be overwritten? (also can specify overwrite = "backup" to move the existing output to _bak)
postTransFn
an optional function to be applied to the each final key-value pair after joining
params
a named list of objects external to the input data that are needed in the distributed computing (most should be taken care of automatically such that this is rarely necessary to specify)
packages
a vector of R package names that contain functions used in fn (most should be taken care of automatically such that this is rarely necessary to specify)
control
parameters specifying how the backend should handle things (most-likely parameters to rhwatch in RHIPE) - see rhipeControl and localDiskContr
...
Input data sources: two or more named DDO objects that will be joined, separated by commas (see Examples for syntax). Specifically, each input object should inherit from the 'ddo' class. It is assumed that all input sources are of same type (all HDFS, all

Value

  • a 'ddo' object stored in the output connection, where the values are named lists with names according to the names given to the input data objects, and values are the corresponding data. The 'ddo' object contains the union of all the keys contained in the input 'ddo' objects specified in ....

See Also

drFilter, drLapply

Examples

Run this code
bySpecies <- divide(iris, by = "Species")
# get independent lists of just SW and SL
sw <- drLapply(bySpecies, function(x) x$Sepal.Width)
sl <- drLapply(bySpecies, function(x) x$Sepal.Length)
drJoin(Sepal.Width = sw, Sepal.Length = sl, postTransFn = as.data.frame)

Run the code above in your browser using DataLab