
Last chance! 50% off unlimited learning
Sale ends in
vignette("basics")
summarise
with dplyr will return a single number, but here it
will return N values depending on how many nodes there are. Typically
you should follow summarise
with reduce
, which is
run locally.
arrange |
Sort data |
distinct |
Select unique rows or unique combinations of variables |
filter |
Filter data |
group_by |
Group data |
group_sizes |
Return size of groups |
groupwise |
Use grouped data (also known as ungroup ) |
mutate |
Change values of existing variables (and create new ones) |
n_groups |
Return number of groups |
rename |
Rename variables |
rowwise |
Use data as individual rows |
select |
Retain only specified variables |
slice |
Select rows by position |
summarise |
Summarise data |
transmute |
Change variables and drop all others |
partition_even |
Partition data evenly amongst cluster nodes |
partition_group |
Partition data so that each group is wholly on a node |
within_group |
Execute code within a group |
within_node |
Execute code within a group |
Multiplyr |
Create new parallel data frame |
define |
Define new variables |
nsa |
No strings attached mode |
reduce |
Summarise locally only |
regroup |
Return to grouped data |
undefine |
Delete variables |
between |
Tests whether elements of a vector lie between two values (inclusively) |
cumall |
Cumulative all |
cumany |
Cumulative any |
cummean |
Cumulative mean |
first |
Returns first value in vector |
last |
Returns last value in vector |
lag |
Offset x backwards by n |
lead |
Offset x forwards by n |
n |
Number of items in current group |
nth |
Return the nth item from a vector |