Unlimited learning, half price | 50% off

Last chance! 50% off unlimited learning

Sale ends in


multiplyr (version 0.1.1)

multiplyr: Data Manipulation with Parellelism and Shared Memory Matrices

Description

Provides a new form of data frame backed by shared memory matrices and a way to manipulate them. Upon creation these data frames are shared across multiple local nodes to allow for simple parallel processing. Run the following command for a more thorough explanation: vignette("basics")

Arguments

Major differences from dplyr

summarise with dplyr will return a single number, but here it will return N values depending on how many nodes there are. Typically you should follow summarise with reduce, which is run locally.

Standard dplyr-like functions

arrange
Sort data
distinct
Select unique rows or unique combinations of variables
filter
Filter data
group_by
Group data
group_sizes
Return size of groups
groupwise
Use grouped data (also known as ungroup)
mutate
Change values of existing variables (and create new ones)
n_groups
Return number of groups
rename
Rename variables
rowwise
Use data as individual rows
select
Retain only specified variables
slice
Select rows by position
summarise
Summarise data
transmute
Change variables and drop all others

Parallel functions

partition_even
Partition data evenly amongst cluster nodes
partition_group
Partition data so that each group is wholly on a node
within_group
Execute code within a group
within_node
Execute code within a group

Additional data frame functions

Multiplyr
Create new parallel data frame
define
Define new variables
nsa
No strings attached mode
reduce
Summarise locally only
regroup
Return to grouped data
undefine
Delete variables

Data manipulation adjuncts

between
Tests whether elements of a vector lie between two values (inclusively)
cumall
Cumulative all
cumany
Cumulative any
cummean
Cumulative mean
first
Returns first value in vector
last
Returns last value in vector
lag
Offset x backwards by n
lead
Offset x forwards by n
n
Number of items in current group
nth
Return the nth item from a vector