
Last chance! 50% off unlimited learning
Sale ends in
expand()
generates all combination of variables found in a dataset.
It is paired with nesting()
and crossing()
helpers. crossing()
is a wrapper around expand_grid()
that de-duplicates and sorts its inputs;
nesting()
is a helper that only finds combinations already present in the
data.
expand()
is often useful in conjunction with joins:
use it with right_join()
to convert implicit missing values to
explicit missing values (e.g., fill in gaps in your data frame).
use it with anti_join()
to figure out which combinations are missing
(e.g., identify gaps in your data frame).
expand(data, ..., .name_repair = "check_unique")crossing(..., .name_repair = "check_unique")
nesting(..., .name_repair = "check_unique")
A data frame.
Specification of columns to expand. Columns can be atomic vectors or lists.
To find all unique combinations of x
, y
and z
, including those not
present in the data, supply each variable as a separate argument:
expand(df, x, y, z)
.
To find only the combinations that occur in the
data, use nesting
: expand(df, nesting(x, y, z))
.
You can combine the two forms. For example,
expand(df, nesting(school_id, student_id), date)
would produce
a row for each present school-student combination for all possible
dates.
When used with factors, expand()
uses the full set of levels, not just
those that appear in the data. If you want to use only the values seen in
the data, use forcats::fct_drop()
.
When used with continuous variables, you may need to fill in values
that do not appear in the data: to do so use expressions like
year = 2010:2020
or year = full_seq(year,1)
.
Treatment of problematic column names:
"minimal"
: No name repair or checks, beyond basic existence,
"unique"
: Make sure names are unique and not empty,
"check_unique"
: (default value), no name repair, but check they are
unique
,
"universal"
: Make the names unique
and syntactic
a function: apply custom name repair (e.g., .name_repair = make.names
for names in the style of base R).
A purrr-style anonymous function, see rlang::as_function()
This argument is passed on as repair
to vctrs::vec_as_names()
.
See there for more details on these terms and the strategies used
to enforce them.
With grouped data frames, expand()
operates within each group. Because of
this, you cannot expand on a grouping column.
complete()
to expand list objects. expand_grid()
to input vectors rather than a data frame.
fruits <- tibble(
type = c("apple", "orange", "apple", "orange", "orange", "orange"),
year = c(2010, 2010, 2012, 2010, 2011, 2012),
size = factor(
c("XS", "S", "M", "S", "S", "M"),
levels = c("XS", "S", "M", "L")
),
weights = rnorm(6, as.numeric(size) + 2)
)
# All possible combinations ---------------------------------------
# Note that all defined, but not necessarily present, levels of the
# factor variable `size` are retained.
fruits %>% expand(type)
fruits %>% expand(type, size)
fruits %>% expand(type, size, year)
# Only combinations that already appear in the data ---------------
fruits %>% expand(nesting(type))
fruits %>% expand(nesting(type, size))
fruits %>% expand(nesting(type, size, year))
# Other uses -------------------------------------------------------
# Use with `full_seq()` to fill in values of continuous variables
fruits %>% expand(type, size, full_seq(year, 1))
fruits %>% expand(type, size, 2010:2013)
# Use `anti_join()` to determine which observations are missing
all <- fruits %>% expand(type, size, year)
all
all %>% dplyr::anti_join(fruits)
# Use with `right_join()` to fill in missing rows
fruits %>% dplyr::right_join(all)
# Use with `group_by()` to expand within each group
fruits %>% dplyr::group_by(type) %>% expand(year, size)
Run the code above in your browser using DataLab