Last chance! 50% off unlimited learning
Sale ends in
Most data operations are done on groups defined by variables.
group_by()
takes an existing tbl and converts it into a grouped tbl
where operations are performed "by group". ungroup()
removes grouping.
group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))ungroup(x, ...)
A grouped data frame with class grouped_df
,
unless the combination of ...
and add
yields a empty set of
grouping columns, in which case a tibble will be returned.
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
In group_by()
, variables or computations to group by.
Computations are always done on the ungrouped data frame.
To perform computations on the grouped data, you need to use
a separate mutate()
step before the group_by()
.
Computations are not allowed in nest_by()
.
In ungroup()
, variables to remove from the grouping.
When FALSE
, the default, group_by()
will
override existing groups. To add to the existing groups, use
.add = TRUE
.
This argument was previously called add
, but that prevented
creating a new grouping variable called add
, and conflicts with
our naming conventions.
Drop groups formed by factor levels that don't appear in the
data? The default is TRUE
except when .data
has been previously
grouped with .drop = FALSE
. See group_by_drop_default()
for details.
A tbl()
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
group_by()
: dplyr:::methods_rd("group_by").
ungroup()
: dplyr:::methods_rd("ungroup").
Other grouping functions:
group_map()
,
group_nest()
,
group_split()
,
group_trim()
by_cyl <- mtcars %>% group_by(cyl)
# grouping doesn't change how the data looks (apart from listing
# how it's grouped):
by_cyl
# It changes how it acts with the other dplyr verbs:
by_cyl %>% summarise(
disp = mean(disp),
hp = mean(hp)
)
by_cyl %>% filter(disp == max(disp))
# Each call to summarise() removes a layer of grouping
by_vs_am <- mtcars %>% group_by(vs, am)
by_vs <- by_vs_am %>% summarise(n = n())
by_vs
by_vs %>% summarise(n = sum(n))
# To removing grouping, use ungroup
by_vs %>%
ungroup() %>%
summarise(n = sum(n))
# By default, group_by() overrides existing grouping
by_cyl %>%
group_by(vs, am) %>%
group_vars()
# Use add = TRUE to instead append
by_cyl %>%
group_by(vs, am, .add = TRUE) %>%
group_vars()
# You can group by expressions: this is a short-hand
# for a mutate() followed by a group_by()
mtcars %>%
group_by(vsam = vs + am)
# The implicit mutate() step is always performed on the
# ungrouped data. Here we get 3 groups:
mtcars %>%
group_by(vs) %>%
group_by(hp_cut = cut(hp, 3))
# If you want it to be performed by groups,
# you have to use an explicit mutate() call.
# Here we get 3 groups per value of vs
mtcars %>%
group_by(vs) %>%
mutate(hp_cut = cut(hp, 3)) %>%
group_by(hp_cut)
# when factors are involved and .drop = FALSE, groups can be empty
tbl <- tibble(
x = 1:10,
y = factor(rep(c("a", "c"), each = 5), levels = c("a", "b", "c"))
)
tbl %>%
group_by(y, .drop = FALSE) %>%
group_rows()
Run the code above in your browser using DataLab