Filter a 'ddo' or 'ddf' object by selecting key-value pairs that satisfy a logical condition
drFilter(x, filterFn, output = NULL, overwrite = FALSE, params = NULL,
packages = NULL, control = NULL)
an object of class 'ddo' or 'ddf'
function that takes either a key-value pair (as two arguments) or just a value (as a single argument) and returns either TRUE
or FALSE
- if TRUE
, that key-value pair will be present in the result. See examples for details.
a "kvConnection" object indicating where the output data should reside (see localDiskConn
, hdfsConn
). If NULL
(default), output will be an in-memory "ddo" object.
logical; should existing output location be overwritten? (also can specify overwrite = "backup"
to move the existing output to _bak)
a named list of objects external to the input data that are needed in the distributed computing (most should be taken care of automatically such that this is rarely necessary to specify)
a vector of R package names that contain functions used in filterFn
(most should be taken care of automatically such that this is rarely necessary to specify)
parameters specifying how the backend should handle things (most-likely parameters to rhwatch
in RHIPE) - see rhipeControl
and localDiskControl
a 'ddo' or 'ddf' object
# NOT RUN {
# Create a ddf using the iris data
bySpecies <- divide(iris, by = "Species")
# Filter using only the 'value' of the key/value pair
drFilter(bySpecies, function(v) mean(v$Sepal.Width) < 3)
# Filter using both the key and value
drFilter(bySpecies, function(k,v) k != "Species=virginica" & mean(v$Sepal.Width) < 3)
# }
Run the code above in your browser using DataLab