vignette("basics")
summarise with dplyr will return a single number, but here it
will return N values depending on how many nodes there are. Typically
you should follow summarise with reduce, which is
run locally.
arrange |
| Sort data |
distinct |
| Select unique rows or unique combinations of variables |
filter |
| Filter data |
group_by |
| Group data |
group_sizes |
| Return size of groups |
groupwise |
Use grouped data (also known as ungroup) |
mutate |
| Change values of existing variables (and create new ones) |
n_groups |
| Return number of groups |
rename |
| Rename variables |
rowwise |
| Use data as individual rows |
select |
| Retain only specified variables |
slice |
| Select rows by position |
summarise |
| Summarise data |
transmute |
| Change variables and drop all others |
partition_even |
| Partition data evenly amongst cluster nodes |
partition_group |
| Partition data so that each group is wholly on a node |
within_group |
| Execute code within a group |
within_node |
| Execute code within a group |
Multiplyr |
| Create new parallel data frame |
define |
| Define new variables |
nsa |
| No strings attached mode |
reduce |
| Summarise locally only |
regroup |
| Return to grouped data |
undefine |
| Delete variables |
between |
| Tests whether elements of a vector lie between two values (inclusively) |
cumall |
| Cumulative all |
cumany |
| Cumulative any |
cummean |
| Cumulative mean |
first |
| Returns first value in vector |
last |
| Returns last value in vector |
lag |
| Offset x backwards by n |
lead |
| Offset x forwards by n |
n |
| Number of items in current group |
nth |
| Return the nth item from a vector |