Unlimited learning, half price | 50% off

Last chance! 50% off unlimited learning

Sale ends in


Overview

multiplyr provides a simple interface for manipulating data combined with easy parallel processing capabilities. It's intended that this works very similarly (eventually almost interchangably) with the dplyr package, as many people may be familiar with that already.

# Create a new data frame with 2 columns (x & G) and space for 2 new columns
dat <- Multiplyr (x=1:100, G=rep(c("A", "B", "C", "D"), each=25), alloc=2)

# Group data (A, B, C, D)
dat %>% group_by (G)

# Create a new variable (y) with random data, the same length as x
dat %>% mutate (y=rnorm(length(x)))

# Remove any rows where y < 0
dat %>% filter (y<0)

# Summarise to give 4 rows (A, B, C, D), with number of rows in each group
dat %>% summarise (N=length(x))

Run the following code once multiplyr is installed for more details:

vignette ("basics")

Installation

Install latest version from CRAN:

install.packages ("multiplyr")

Development

Install latest stable development version:

# install.packages("devtools")
devtools::install_github("jeblundell/multiplyr", ref="stable", build_vignettes = TRUE)

Branches

  • master: represents the version currently in CRAN
  • stable: the latest commit from develop that passes all tests
  • develop: current state of development

Copy Link

Version

Install

install.packages('multiplyr')

Monthly Downloads

8

Version

0.1.1

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Jim Blundell

Last Published

May 31st, 2016

Functions in multiplyr (0.1.1)

arrange

Sort data
bm_mpermute

Extension of bigmemory::mpermute to allow decreasing parameter to be a vector
define

Define new columns
cumall

Cumulative all
between

Tests whether elements of a vector lie between two values (inclusively)
cumany

Cumulative any
add_rownames

Add a new column with row names
bm_morder

Extension of bigmemory::morder to allow decreasing parameter to be a vector
desc

Arrange specified column in descending order
cummean

Cumulative mean
distribute

Calculations for how to distribute x items over N nodes
first

Returns first value in vector
distinct

Select unique rows or unique combinations of variables
dotseval

Evaluate previously captured dots
group_by

Group data
dotscapture

Capture ... for later evaluation
dotsname1

Name an expression (called by dotsname)
mutate

Change values of existing variables (and create new ones)
Multiplyr-class

Parallel processing data frame
group_sizes

Return size of groups
last

Returns last value in vector
Multiplyr-methods

Data access methods for Multiplyr
dotscombine

Combine explicit and implicit dots
dotsname

Ensure captured dots are all named
filter

Filter data
nth

Return the nth item from a vector
reduce

Summarise data (with local reduction)
.p

Concatenate (internal)
lag

Offset x backwards by n
n_groups

Return number of groups
lead

Offset x forwards by n
multiplyr

Data Manipulation with Parellelism and Shared Memory Matrices
rename

Rename variables
select

Retain only specified variables
summarise

Summarise data
sm_desc_update

Update description of a big.matrix after a row subset (internal)
sm_desc_group

Returns a big.matrix descriptor for a particular group ID
test_transition

Test for grouping transition (internal)
within_node

Execute code within a node
ungroup

Return data to non-grouped
within_group

Execute code within a group
undefine

Delete variables
n_distinct

Return the number of unique values
n

Number of items in current group
NA_class_

Returns NA of a particular class
partition_even

Partition data evenly amongst cluster nodes
nonunique

Returns values of x that are non-unique
regroup

Return to grouped data
partition_group

Partition data so that each group is wholly on a node
nsa

No strings attached mode
slice

Select rows by position
sm_desc_comp

Returns big.matrix descriptor offset by 1 (for row by row comparisons)
shutdown

Shutdown running cluster
sm_desc_subset

Returns big.matrix descriptor limited to particular start/end row
transmute

Change variables and drop all others