Learn R Programming

multiplyr (version 0.1.1)

Multiplyr-class: Parallel processing data frame

Description

With the exception of calling Multiplyr to create a new data frame, none of the methods/fields here are really intended for general use: it's generally best to stick to the manipulation functions. Run the following command to get a better overview: vignette("basics")

Arguments

...
Either a data frame or a list of name=value pairs
cl
Cluster object, number of nodes or NULL (default)
alloc
Allocate additional columns
auto_compact
Automatically compact data after filter operations
auto_partition
Automatically re-partition after group_by
profiling
Enable internal profiling code

Value

Object of class Multiplyr

Fields

auto_compact
Compact data after each filtering etc. operation
auto_partition
Re-partition after group_by
bindenv
Environment for within_group etc. operations
bm
big.matrix (internal representation of data)
bm.master
big.matrix for certain operations that need non-subsetted data
cls
SOCKcluster created by parallel package
col.names
Name of each column; names starting "." are special and NA is a free column
desc.master
big.matrix.descriptor for setting up shared memory access
empty
Flag indicating that this data frame is empty
factor.cols
Which columns are factors/character
factor.levels
List (same length as factor.cols) containing corresponding factor levels
filtercol
Which column in bm indicates filtering (1=included, 0=excluded)
filtered
Flag indicating that this data frame has had filtering applied
first
Subsetting: first row
group.cols
Which columns are involved in grouping
groupcol
Which column in bm contains the group ID
grouped
Flag indicating whether grouped
groupenv
List of environments corresponding to group IDs in group
group_max
Number of groups
group_partition
Flag indicating that partition_group() has been used
group_sizes_stale
Flag indicating that group sizes need to be re-calculated
group
Which group IDs are assigned to this data frame
last
Subsetting: last row
nsamode
Flag indicating whether data frame is in no-strings-attached mode
order.cols
Display order of columns
pad
Number of spaces to pad each column or 0 for dynamic
profile_names
Profile names
profile_real
Total elapsed time for each profile
profile_rreal
Reference time for total elapsed
profile_rsys
Reference time for system
profile_ruser
Reference time for user
profile_sys
Total system time for each profile
profile_user
Total user time for each profile
profiling
Flag indicating that profiling is to be used
slave
Flag indicating whether cluster_* operations are valid
tmpcol
Which column may be used for temporary calculations
type.cols
Column type (0=numeric, 1=character, 2=factor)

Methods

alloc_col(name = ".tmp", update = FALSE)
Allocate a new column and optionally update cluster nodes to do the same. Returns the column number
build_grouped()
Build group environments
calc_group_sizes(delay = TRUE)
Calculate group sizes (if delay=TRUE then this will just mark group sizes as being stale)
cluster_eval(...)
Executes specified expression on cluster
cluster_export(var, var.as = NULL, envir = parent.frame())
Exports a variable from current environment to the cluster, optionally with a different name
cluster_export_each(var, var.as = var, envir = parent.frame())
Like cluster_export, but exports only one element of each variable to each node
cluster_export_self()
Exports this data frame to the cluster (naming it .local)
cluster_profile()
Update profile totals to include all nodes' totals (also resets nodes' totals to 0)
cluster_running()
Checks whether cluster is running
cluster_start(cl = NULL)
Starts a cluster with cl cores if cl is numeric, detectCores()-1 if cl is NULL, or uses specified existing cluster
cluster_stop(only.if.started = FALSE)
Stops cluster
compact()
Re-sorts data so all rows included after filtering are contiguous (and calls sub.big.matrix in the process)
describe()
Describes data frame (for later use by reattach_slave)
destroy_grouped()
Removes grouped data on remote nodes
envir(nsa = NULL)
Returns an environment with active bindings to columns (may also temporarily set no strings attached mode)
factor_map(var, vals)
For a given set of values (numeric or character), map it to be numeric: this is used to store data in big.matrix
filter_range(start, end)
Only include specified rows. Note that start and end are relative to all rows in the big.matrix, filtered or otherwise
filter_rows(rows)
Only include specified numeric rows. Note that rows refer to all rows in the big.matrix, filtered or otherwise
filter_vector(rows)
Only include these rows (given as a vector of TRUE/FALSE values). Note that this applies to all rows in the big.matrix, filtered or otherwise
finalize()
Destructor
free_col(cols, update = FALSE)
Free specified (numeric) column and optionally update cluster
get_data(i = NULL, j = NULL, nsa = NULL, drop = TRUE)
Retrieve given rows (i), columns (j). drop=TRUE with 1 column will return a vector, otherwise a standard data.frame. If no strings attached mode is enabled, this will only return a vector or a matrix
group_cache_attach(descres)
Attach data frame to group_cache
group_restrict(grpid = NULL)
Restricts data to only specified group ID. If NULL, returns to non-restricted.
initialize(..., alloc = 0, cl = NULL, auto_compact = TRUE, auto_partition = TRUE, profiling = TRUE)
Constructor
local_subset(first, last)
Applies sub.big.matrix to bm
partition_even(extend = FALSE)
Partitions data evenly across cluster, irrespective of grouping boundaries
profile(action = NULL, name = NULL)
Profiling function: action may be start or stop. If no parameters, this returns a data.frame of profiling timings
profile_import(prof)
Adds totals from provided profile to this data frame's profiling data
reattach_slave(descres)
Used for nodes to reattach to a specified shared memory object
rebuild_grouped()
Executes destroy_grouped(), followed by build_grouped()
row_names()
Returns some entirely arbitrary row names
set_data(i = NULL, j = NULL, value, nsa = NULL)
Set data in given rows (i) and columns (j). If in no strings attached mode, then value must be entirely numeric
sort(decreasing = FALSE, dots = NULL, cols = NULL, with.group = TRUE)
Sorts data by specified (numeric) columns or by translating from a lazy_dots object. with.group is used to ensure that the sort is by grouping columns first to ensure contiguity
submatrix(a, b)
Returns a sub.big.matrix between specified rows (a:b)
update_fields(fieldnames)
Update specified cluster data frames' field names to be the same as this one's

Examples

Run this code

dat <- Multiplyr (x=1:100, G=rep(c("A", "B"), each=50), cl=2)
dat %>% shutdown()
dat.df <- data.frame (x=1:100, G=rep(c("A", "B"), each=50))
dat <- Multiplyr (dat.df, cl=2)
dat %>% shutdown()

Run the code above in your browser using DataLab