datadr (version 0.8.4)

drSubset: Subsetting Distributed Data Frames

Description

Return a subset of a "ddf" object to memory

Usage

drSubset(data, subset = NULL, select = NULL, drop = FALSE,
  preTransFn = NULL, maxRows = 500000, params = NULL, packages = NULL,
  control = NULL, verbose = TRUE)

Arguments

data
object to be subsetted -- an object of class "ddf" or "ddo" - in the latter case, need to specify preTransFn to coerce each subset into a data frame
subset
logical expression indicating elements or rows to keep: missing values are taken as false
select
expression, indicating columns to select from a data frame
drop
passed on to [ indexing operator
preTransFn
a transformation function (if desired) to applied to each subset prior to division - note: this is deprecated - instead use addTransform prior to calling divide
maxRows
the maximum number of rows to return
params
a named list of objects external to the input data that are needed in the distributed computing (most should be taken care of automatically such that this is rarely necessary to specify)
packages
a vector of R package names that contain functions used in fn (most should be taken care of automatically such that this is rarely necessary to specify)
control
parameters specifying how the backend should handle things (most-likely parameters to rhwatch in RHIPE) - see rhipeControl and localDiskContr
verbose
logical - print messages about what is being done

Value

  • data frame

Examples

Run this code
d <- divide(iris, by = "Species")
drSubset(d, Sepal.Length < 5)

Run the code above in your browser using DataLab