pivot_wider_spec: Pivot data from long to wide using a spec

Description

This is a low level interface to pivotting, inspired by the cdata package, that allows you to describe pivotting with a data frame.

Usage

pivot_wider_spec(
  data,
  spec,
  names_repair = "check_unique",
  id_cols = NULL,
  values_fill = NULL,
  values_fn = NULL
)
build_wider_spec(
  data,
  names_from = name,
  values_from = value,
  names_prefix = "",
  names_sep = "_",
  names_glue = NULL,
  names_sort = FALSE
)

Arguments

data

A data frame to pivot.

spec

A specification data frame. This is useful for more complex pivots because it gives you greater control on how metadata stored in the columns become column names in the result.

Must be a data frame containing character .name and .value columns. Additional columns in spec should be named to match columns in the long format of the dataset and contain values corresponding to columns pivoted from the wide format. The special .seq variable is used to disambiguate rows internally; it is automatically removed after pivotting.

names_repair

What happens if the output has invalid column names? The default, "check_unique" is to error if the columns are duplicated. Use "minimal" to allow duplicates in the output, or "unique" to de-duplicated by adding numeric suffixes. See vctrs::vec_as_names() for more options.

id_cols

<tidy-select> A set of columns that uniquely identifies each observation. Defaults to all columns in data except for the columns specified in names_from and values_from. Typically used when you have redundant variables, i.e. variables whose values are perfectly correlated with existing variables.

values_fill

Optionally, a (scalar) value that specifies what each value should be filled in with when missing.

This can be a named list if you want to apply different aggregations to different value columns.

values_fn

Optionally, a function applied to the value in each cell in the output. You will typically use this when the combination of id_cols and value column does not uniquely identify an observation.

This can be a named list if you want to apply different aggregations to different value columns.

names_from

<tidy-select> A pair of arguments describing which column (or columns) to get the name of the output column (names_from), and which column (or columns) to get the cell values from (values_from).

If values_from contains multiple values, the value will be added to the front of the output column.

values_from

If values_from contains multiple values, the value will be added to the front of the output column.

names_prefix

String added to the start of every variable name. This is particularly useful if names_from is a numeric vector and you want to create syntactic variable names.

names_sep

If names_from or values_from contains multiple variables, this will be used to join their values together into a single string to use as a column name.

names_glue

Instead of names_sep and names_prefix, you can supply a glue specification that uses the names_from columns (and special .value) to create custom column names.

names_sort

Should the column names be sorted? If FALSE, the default, column names are ordered by first appearance.

Examples

Run this code

# NOT RUN {
# See vignette("pivot") for examples and explanation

us_rent_income
spec1 <- us_rent_income %>%
  build_wider_spec(names_from = variable, values_from = c(estimate, moe))
spec1

us_rent_income %>%
  pivot_wider_spec(spec1)

# Is equivalent to
us_rent_income %>%
  pivot_wider(names_from = variable, values_from = c(estimate, moe))

# `pivot_wider_spec()` provides more control over column names and output format
# instead of creating columns with estimate_ and moe_ prefixes,
# keep original variable name for estimates and attach _moe as suffix
spec2 <- tibble(
  .name = c("income", "rent", "income_moe", "rent_moe"),
  .value = c("estimate", "estimate", "moe", "moe"),
  variable = c("income", "rent", "income", "rent")
)

us_rent_income %>%
  pivot_wider_spec(spec2)
# }

Run the code above in your browser using DataLab