Grouping a data set acts in the normal way. When tracking a dataframe
sometimes a group_by()
operation will create a lot of groups. This happens
for example if you are doing a group_by()
, summarise()
step that is
aggregating data on a fine scale, e.g. by day in a time-series. This is
generally a terrible idea when tracking a dataframe as the resulting
flowchart will have many many branches and be illegible. dtrackr
will detect this issue and
pause tracking the dataframe with a warning. It is up to the user to the
resume()
tracking when the large number of groups have been resolved e.g.
using a dplyr::ungroup()
. This limit is configurable with
options("dtrackr.max_supported_groupings"=XX)
. The default is 16. See
dplyr::group_by()
.
p_group_by(
.data,
...,
.messages = "stratify by {.cols}",
.headline = NULL,
.tag = NULL,
.maxgroups = .defaultMaxSupportedGroupings()
)
the .data but grouped.
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
In group_by()
, variables or computations to group by.
Computations are always done on the ungrouped data frame.
To perform computations on the grouped data, you need to use
a separate mutate()
step before the group_by()
.
Computations are not allowed in nest_by()
.
In ungroup()
, variables to remove from the grouping.
Named arguments passed on to dplyr::group_by
.add
When FALSE
, the default, group_by()
will
override existing groups. To add to the existing groups, use
.add = TRUE
.
This argument was previously called add
, but that prevented
creating a new grouping variable called add
, and conflicts with
our naming conventions.
.drop
Drop groups formed by factor levels that don't appear in the
data? The default is TRUE
except when .data
has been previously
grouped with .drop = FALSE
. See group_by_drop_default()
for details.
x
A tbl()
a set of glue specs. The glue code can use any global variable, or {.cols} which is the columns that are being grouped by.
a headline glue spec. The glue code can use any global variable, or {.cols}.
if you want the summary data from this step in the future then give it a name with .tag.
the maximum number of subgroups allowed before the tracking is paused.
dplyr::group_by()
library(dplyr)
library(dtrackr)
tmp = iris %>% track() %>% group_by(Species, .messages="stratify by {.cols}")
tmp %>% comment("{.strata}") %>% history()
Run the code above in your browser using DataLab