Learn R Programming

spicy (version 0.11.0)

varlist: Generate a comprehensive summary of the variables

Description

varlist() lists the variables of a data frame and extracts essential metadata, including variable names, labels, summary values, classes, number of distinct values, number of valid (non-missing) observations, and number of missing values.

vl() is a convenient shorthand for varlist() that offers identical functionality with a shorter name.

Usage

varlist(
  x,
  ...,
  values = FALSE,
  tbl = FALSE,
  include_na = FALSE,
  factor_levels = c("observed", "all")
)

vl( x, ..., values = FALSE, tbl = FALSE, include_na = FALSE, factor_levels = c("observed", "all") )

Value

A tibble with one row per selected variable, containing the following columns:

  • Variable: variable names

  • Label: variable labels (if available via the label attribute)

  • Values: a summary of the variable's values, depending on the values and include_na arguments. If values = FALSE, a compact summary is shown: all unique values when there are at most four, otherwise 3 + ... + last. If values = TRUE, all unique non-missing values are displayed. For labelled variables, prefixed labels are displayed using labelled::to_factor(levels = "prefixed"). For factors, levels are displayed according to factor_levels. Matrix and array columns are summarized by their dimensions. Missing value markers (<NA>, <NaN>) are optionally appended at the end (controlled via include_na). Literal strings "NA", "NaN", and "" are quoted to distinguish them from missing markers.

  • Class: the class of each variable (possibly multiple, e.g. "labelled", "numeric")

  • N_distinct: number of distinct non-missing values

  • N_valid: number of non-missing observations

  • NAs: number of missing observations

For matrix and array columns, observations are counted per row: a row is treated as missing if any of its cells is NA. N_valid

/ NAs therefore count complete vs. incomplete rows, not individual cells.

If tbl = TRUE, the tibble is returned. If tbl = FALSE and the session is interactive, the summary is displayed in the Viewer pane and the function returns invisibly. In non-interactive sessions, a message is displayed and the function returns invisibly.

Arguments

x

A data frame, or a transformation of one.

...

Optional tidyselect-style column selectors (e.g. starts_with("var"), where(is.numeric), etc.). Columns can be selected or reordered, but renaming selections is not supported.

values

Logical. If FALSE (the default), displays a compact summary of the variable's values. For numeric, character, date/time, labelled, and factor variables, all unique non-missing values are shown when there are at most four; otherwise the first three values, an ellipsis (...), and the last value are shown. Values are sorted when appropriate (e.g., numeric, character, date). For factors, factor_levels controls whether observed or all declared levels are shown; level order is preserved. For labelled variables, prefixed labels are displayed via labelled::to_factor(levels = "prefixed"). If TRUE, all unique non-missing values are displayed.

tbl

Logical. If FALSE (the default), opens the summary in the Viewer if the session is interactive. If TRUE, returns a tibble.

include_na

Logical. If TRUE, unique missing value markers (<NA>, <NaN>) are explicitly appended at the end of the Values summary when present in the variable. This applies to all variable types. Literal strings "NA", "NaN", and "" are quoted to distinguish them from missing markers. If FALSE (the default), missing values are omitted from Values but still counted in the NAs column.

factor_levels

Character. Controls how factor values are displayed in Values. "observed" (the default; code_book() uses "all") shows only levels present in the data, preserving factor level order. "all" shows all declared levels, including unused levels.

Details

The function can also apply tidyselect-style variable selectors to select or reorder columns dynamically.

If used interactively (e.g. in RStudio or Positron), the summary is displayed in the Viewer pane with a contextual title like vl: sochealth. If the data frame has been transformed or subsetted, the title will display an asterisk (*), e.g. vl: sochealth*. Anonymous or ambiguous calls use vl: <data>.

For factor variables, varlist() defaults to displaying only the levels observed in the data (factor_levels = "observed") — a reflection of what is actually present. By contrast, code_book() defaults to "all" to document the declared schema, including unused levels. Pass factor_levels explicitly to override either default.

See Also

Other variable inspection: code_book(), label_from_names()

Examples

Run this code
varlist(sochealth, tbl = TRUE)
sochealth |> varlist(tbl = TRUE)
varlist(sochealth, where(is.numeric), values = TRUE, tbl = TRUE)
varlist(
  sochealth,
  starts_with("bmi"),
  values = TRUE,
  include_na = TRUE,
  tbl = TRUE
)

df <- data.frame(
  group = factor(c("A", "B", NA), levels = c("A", "B", "C"))
)
varlist(
  df,
  values = TRUE,
  include_na = TRUE,
  factor_levels = "all",
  tbl = TRUE
)

vl(sochealth, tbl = TRUE)
sochealth |> vl(tbl = TRUE)
vl(sochealth, starts_with("bmi"), tbl = TRUE)
vl(sochealth, where(is.numeric), values = TRUE, tbl = TRUE)

Run the code above in your browser using DataLab