Most data operations are done on groups defined by variables.
group_by() takes an existing tbl and converts it into a grouped tbl
where operations are performed "by group". ungroup() removes grouping.
# S3 method for Seurat
group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))A grouped data frame with class grouped_df,
unless the combination of ... and add yields a empty set of
grouping columns, in which case a tibble will be returned.
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
<data-masking> In group_by(),
variables or computations to group by. Computations are always done on the
ungrouped data frame. To perform computations on the grouped data, you need
to use a separate mutate() step before the group_by().
Computations are not allowed in nest_by().
In ungroup(), variables to remove from the grouping.
When FALSE, the default, group_by() will
override existing groups. To add to the existing groups, use
.add = TRUE.
Drop groups formed by factor levels that don't appear in the
data? The default is TRUE except when .data has been previously
grouped with .drop = FALSE. See group_by_drop_default() for details.
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
group_by(): dplyr:::methods_rd("group_by").
ungroup(): dplyr:::methods_rd("ungroup").
Currently, group_by() internally orders the groups in ascending order. This
results in ordered output from functions that aggregate groups, such as
summarise().
When used as grouping columns, character vectors are ordered in the C locale
for performance and reproducibility across R sessions. If the resulting
ordering of your grouped operation matters and is dependent on the locale,
you should follow up the grouped operation with an explicit call to
arrange() and set the .locale argument. For example:
data |>
group_by(chr) |>
summarise(avg = mean(x)) |>
arrange(chr, .locale = "en")
This is often useful as a preliminary step before generating content intended for humans, such as an HTML table.
Prior to dplyr 1.1.0, character vector grouping columns were ordered in the
system locale. Setting the global option dplyr.legacy_locale to TRUE
retains this legacy behavior, but this has been deprecated. Update existing
code to explicitly call arrange(.locale = ) instead. Run
Sys.getlocale("LC_COLLATE") to determine your system locale, and compare
that against the list in stringi::stri_locale_list() to find an appropriate
value for .locale, i.e. for American English, "en_US".
Other grouping functions:
group_map(),
group_nest(),
group_split(),
group_trim()
data("pbmc_small")
pbmc_small |> group_by(groups)
Run the code above in your browser using DataLab