Multiplyr-class: Parallel processing data frame

Description

With the exception of calling Multiplyr to create a new data frame, none of the methods/fields here are really intended for general use: it's generally best to stick to the manipulation functions. Run the following command to get a better overview: vignette("basics")

Arguments

...

Either a data frame or a list of name=value pairs

Cluster object, number of nodes or NULL (default)

alloc

Allocate additional columns

auto_compact

Automatically compact data after filter operations

auto_partition

Automatically re-partition after group_by

profiling

Enable internal profiling code

Value

Object of class Multiplyr

Fields

auto_compact: Compact data after each filtering etc. operation
auto_partition: Re-partition after group_by
bindenv: Environment for within_group etc. operations
bm: big.matrix (internal representation of data)
bm.master: big.matrix for certain operations that need non-subsetted data
cls: SOCKcluster created by parallel package
col.names: Name of each column; names starting "." are special and NA is a free column
desc.master: big.matrix.descriptor for setting up shared memory access
empty: Flag indicating that this data frame is empty
factor.cols: Which columns are factors/character
factor.levels: List (same length as factor.cols) containing corresponding factor levels
filtercol: Which column in bm indicates filtering (1=included, 0=excluded)
filtered: Flag indicating that this data frame has had filtering applied
first: Subsetting: first row
group.cols: Which columns are involved in grouping
groupcol: Which column in bm contains the group ID
grouped: Flag indicating whether grouped
groupenv: List of environments corresponding to group IDs in group
group_max: Number of groups
group_partition: Flag indicating that partition_group() has been used
group_sizes_stale: Flag indicating that group sizes need to be re-calculated
group: Which group IDs are assigned to this data frame
last: Subsetting: last row
nsamode: Flag indicating whether data frame is in no-strings-attached mode
order.cols: Display order of columns
pad: Number of spaces to pad each column or 0 for dynamic
profile_names: Profile names
profile_real: Total elapsed time for each profile
profile_rreal: Reference time for total elapsed
profile_rsys: Reference time for system
profile_ruser: Reference time for user
profile_sys: Total system time for each profile
profile_user: Total user time for each profile
profiling: Flag indicating that profiling is to be used
slave: Flag indicating whether cluster_* operations are valid
tmpcol: Which column may be used for temporary calculations
type.cols: Column type (0=numeric, 1=character, 2=factor)

Methods

alloc_col(name = ".tmp", update = FALSE): Allocate a new column and optionally update cluster nodes to do the same. Returns the column number
build_grouped(): Build group environments
calc_group_sizes(delay = TRUE): Calculate group sizes (if delay=TRUE then this will just mark group sizes as being stale)
cluster_eval(...): Executes specified expression on cluster
cluster_export(var, var.as = NULL, envir = parent.frame()): Exports a variable from current environment to the cluster, optionally with a different name
cluster_export_each(var, var.as = var, envir = parent.frame()): Like cluster_export, but exports only one element of each variable to each node
cluster_export_self(): Exports this data frame to the cluster (naming it .local)
cluster_profile(): Update profile totals to include all nodes' totals (also resets nodes' totals to 0)
cluster_running(): Checks whether cluster is running
cluster_start(cl = NULL): Starts a cluster with cl cores if cl is numeric, detectCores()-1 if cl is NULL, or uses specified existing cluster
cluster_stop(only.if.started = FALSE): Stops cluster
compact(): Re-sorts data so all rows included after filtering are contiguous (and calls sub.big.matrix in the process)
describe(): Describes data frame (for later use by reattach_slave)
destroy_grouped(): Removes grouped data on remote nodes
envir(nsa = NULL): Returns an environment with active bindings to columns (may also temporarily set no strings attached mode)
factor_map(var, vals): For a given set of values (numeric or character), map it to be numeric: this is used to store data in big.matrix
filter_range(start, end): Only include specified rows. Note that start and end are relative to all rows in the big.matrix, filtered or otherwise
filter_rows(rows): Only include specified numeric rows. Note that rows refer to all rows in the big.matrix, filtered or otherwise
filter_vector(rows): Only include these rows (given as a vector of TRUE/FALSE values). Note that this applies to all rows in the big.matrix, filtered or otherwise
finalize(): Destructor
free_col(cols, update = FALSE): Free specified (numeric) column and optionally update cluster
get_data(i = NULL, j = NULL, nsa = NULL, drop = TRUE): Retrieve given rows (i), columns (j). drop=TRUE with 1 column will return a vector, otherwise a standard data.frame. If no strings attached mode is enabled, this will only return a vector or a matrix
group_cache_attach(descres): Attach data frame to group_cache
group_restrict(grpid = NULL): Restricts data to only specified group ID. If NULL, returns to non-restricted.
initialize(..., alloc = 0, cl = NULL, auto_compact = TRUE, auto_partition = TRUE, profiling = TRUE): Constructor
local_subset(first, last): Applies sub.big.matrix to bm
partition_even(extend = FALSE): Partitions data evenly across cluster, irrespective of grouping boundaries
profile(action = NULL, name = NULL): Profiling function: action may be start or stop. If no parameters, this returns a data.frame of profiling timings
profile_import(prof): Adds totals from provided profile to this data frame's profiling data
reattach_slave(descres): Used for nodes to reattach to a specified shared memory object
rebuild_grouped(): Executes destroy_grouped(), followed by build_grouped()
row_names(): Returns some entirely arbitrary row names
set_data(i = NULL, j = NULL, value, nsa = NULL): Set data in given rows (i) and columns (j). If in no strings attached mode, then value must be entirely numeric
sort(decreasing = FALSE, dots = NULL, cols = NULL, with.group = TRUE): Sorts data by specified (numeric) columns or by translating from a lazy_dots object. with.group is used to ensure that the sort is by grouping columns first to ensure contiguity
submatrix(a, b): Returns a sub.big.matrix between specified rows (a:b)
update_fields(fieldnames): Update specified cluster data frames' field names to be the same as this one's

Examples

Run this code


dat <- Multiplyr (x=1:100, G=rep(c("A", "B"), each=50), cl=2)
dat %>% shutdown()
dat.df <- data.frame (x=1:100, G=rep(c("A", "B"), each=50))
dat <- Multiplyr (dat.df, cl=2)
dat %>% shutdown()

Run the code above in your browser using DataLab