mutate() with per-group optimisationsA faster mutate() with per-group optimisations
f_mutate(
.data,
...,
.by = NULL,
.order = group_by_order_default(.data),
.keep = "all"
)A data frame with added columns.
A data frame.
Name-value pairs of summary functions. Expressions with
across() are also accepted.
(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select.
Should the groups be returned in sorted order?
If FALSE, this will return the groups in order of first appearance,
and in many cases is faster.
Which columns to keep. Options are 'all', 'used', 'unused' and 'none'.
fastplyr data-masking functions like f_mutate and f_summarise operate
very similarly to their dplyr counterparts but with some crucial
differences.
Optimisations for by-group operations kick in for
common statistical functions which are detailed below.
A message will be printed which one can disable
by running options(fastplyr.inform = FALSE).
When this happens, the expressions which become optimised no longer
obey data-masking rules pertaining to sequential and dependent expression
execution.
For example,
the pseudo code
f_summarise(data, mean = mean(x), mean2 = round(mean), .by = g)
when optimised will not work because the named col mean will not be visible
in later expressions.
One can disable fastplyr optimisations
globally by running options(fastplyr.optimise = F).
Some functions are internally optimised using 'collapse' fast statistical functions. This makes execution on many groups very fast.
For fast quantiles (percentiles) by group, see tidy_quantiles
List of currently optimised functions
dplyr::n -> <custom_expression>
dplyr::row_number -> <custom_expression> (only for f_mutate)
dplyr::cur_group -> <custom_expression>
dplyr::cur_group_id -> <custom_expression>
dplyr::cur_group_rows -> <custom_expression> (only for f_mutate)
dplyr::lag -> <custom_expression> (only for f_mutate)
dplyr::lead -> <custom_expression> (only for f_mutate)
base::sum -> collapse::fsum
base::prod -> collapse::fprod
base::min -> collapse::fmin
base::max -> collapse::fmax
stats::mean -> collapse::fmean
stats::median -> collapse::fmedian
stats::sd -> collapse::fsd
stats::var -> collapse::fvar
dplyr::first -> collapse::ffirst
dplyr::last -> collapse::flast
dplyr::n_distinct -> collapse::fndistinct