pivot_wider() "widens" data, increasing the number of columns and
decreasing the number of rows. The inverse transformation is
pivot_longer().
Learn more in vignette("pivot", "tidyr").
Note that pivot_wider() is not and cannot be lazy because we need to look
at the data to figure out what the new column names will be.
# S3 method for tbl_lazy
pivot_wider(
data,
...,
id_cols = NULL,
id_expand = FALSE,
names_from = name,
names_prefix = "",
names_sep = "_",
names_glue = NULL,
names_sort = FALSE,
names_vary = "fastest",
names_expand = FALSE,
names_repair = "check_unique",
values_from = value,
values_fill = NULL,
values_fn = ~max(.x, na.rm = TRUE),
unused_fn = NULL
)A lazy data frame backed by a database query.
Unused; included for compatibility with generic.
A set of columns that uniquely identifies each observation.
Unused; included for compatibility with the generic.
A pair of
arguments describing which column (or columns) to get the name of the
output column (names_from), and which column (or columns) to get the
cell values from (values_from).
If values_from contains multiple values, the value will be added to the
front of the output column.
String added to the start of every variable name.
If names_from or values_from contains multiple
variables, this will be used to join their values together into a single
string to use as a column name.
Instead of names_sep and names_prefix, you can supply
a glue specification that uses the names_from columns (and special
.value) to create custom column names.
Should the column names be sorted? If FALSE, the default,
column names are ordered by first appearance.
When names_from identifies a column (or columns) with
multiple unique values, and multiple values_from columns are provided,
in what order should the resulting column names be combined?
"fastest" varies names_from values fastest, resulting in a column
naming scheme of the form: value1_name1, value1_name2, value2_name1, value2_name2. This is the default.
"slowest" varies names_from values slowest, resulting in a column
naming scheme of the form: value1_name1, value2_name1, value1_name2, value2_name2.
Should the values in the names_from columns be expanded
by expand() before pivoting? This results in more columns, the output
will contain column names corresponding to a complete expansion of all
possible values in names_from. Additionally, the column names will be
sorted, identical to what names_sort would produce.
What happens if the output has invalid column names?
Optionally, a (scalar) value that specifies what each
value should be filled in with when missing.
A function, the default is max(), applied to the value
in each cell in the output. In contrast to local data frames it must not be
NULL.
Optionally, a function applied to summarize the values from
the unused columns (i.e. columns not identified by id_cols,
names_from, or values_from).
The default drops all unused columns from the result.
This can be a named list if you want to apply different aggregations to different unused columns.
id_cols must be supplied for unused_fn to be useful, since otherwise
all unspecified columns will be considered id_cols.
This is similar to grouping by the id_cols then summarizing the
unused columns using unused_fn.
The big difference to pivot_wider() for local data frames is that
values_fn must not be NULL. By default it is max() which yields
the same results as for local data frames if the combination of id_cols
and value column uniquely identify an observation.
Mind that you also do not get a warning if an observation is not uniquely
identified.
The translation to SQL code basically works as follows:
Get unique keys in names_from column.
For each key value generate an expression of the form:
value_fn(
CASE WHEN (`names from column` == `key value`)
THEN (`value column`)
END
) AS `output column`
Group data by id columns.
Summarise the grouped data with the expressions from step 2.
memdb_frame(
id = 1,
key = c("x", "y"),
value = 1:2
) %>%
tidyr::pivot_wider(
id_cols = id,
names_from = key,
values_from = value
)
Run the code above in your browser using DataLab