Learn R Programming

fastplyr (version 0.2.0)

f_summarise: Summarise each group down to one row

Description

Like dplyr::summarise() but with some internal optimisations for common statistical functions.

Usage

f_summarise(data, ..., .by = NULL, .optimise = TRUE)

f_summarize(data, ..., .by = NULL, .optimise = TRUE)

Value

An un-grouped data frame of summaries by group.

Arguments

data

A data frame.

...

Name-value pairs of summary functions. Expressions with across() are also accepted.

.by

(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select.

.optimise

(Optionally) turn off optimisations for common statistical functions by setting to FALSE. Default is TRUE which uses optimisations.

Details

f_summarise behaves mostly like dplyr::summarise except that expressions supplied to ... are evaluated independently.

Optimised statistical functions

Some functions are internally optimised using 'collapse' fast statistical functions. This makes execution on many groups very fast.

For fast quantiles (percentiles) by group, see tidy_quantiles

List of currently optimised functions and their equivalent 'collapse' function

base::sum -> collapse::fsum
base::prod -> collapse::fprod
base::min -> collapse::fmin
base::max -> collapse::fmax
stats::mean -> collapse::fmean
stats::median -> collapse::fmedian
stats::sd -> collapse::fsd
stats::var -> collapse::fvar
dplyr::first -> collapse::ffirst
dplyr::last -> collapse::flast
dplyr::n_distinct -> collapse::fndistinct

See Also

tidy_quantiles

Examples

Run this code
library(fastplyr)
library(nycflights13)

# Number of flights per month, including first and last day
flights %>%
  f_group_by(year, month) %>%
  f_summarise(first_day = first(day),
              last_day = last(day),
              num_flights = n())

## Fast mean summary using `across()`

flights %>%
  f_summarise(
    across(where(is.double), mean),
    .by = tailnum
  )

# To ignore or keep NAs, use collapse::set_collapse(na.rm)
collapse::set_collapse(na.rm = FALSE)
flights %>%
  f_summarise(
    across(where(is.double), mean),
    .by = origin
  )
collapse::set_collapse(na.rm = TRUE)

Run the code above in your browser using DataLab