varlist() lists the variables of a data frame and extracts essential
metadata, including variable names, labels, summary values, classes, number
of distinct values, number of valid (non-missing) observations, and number
of missing values.
vl() is a convenient shorthand for varlist() that offers identical
functionality with a shorter name.
varlist(
x,
...,
values = FALSE,
tbl = FALSE,
include_na = FALSE,
factor_levels = c("observed", "all")
)vl(
x,
...,
values = FALSE,
tbl = FALSE,
include_na = FALSE,
factor_levels = c("observed", "all")
)
A tibble with one row per selected variable, containing the following columns:
Variable: variable names
Label: variable labels (if available via the label attribute)
Values: a summary of the variable's values, depending on the values
and include_na arguments. If values = FALSE, a compact summary is
shown: all unique values when there are at most four, otherwise
3 + ... + last. If values = TRUE, all unique non-missing values are
displayed. For labelled variables, prefixed labels are displayed using
labelled::to_factor(levels = "prefixed").
For factors, levels are displayed according to factor_levels.
Matrix and array columns are summarized by their dimensions.
Missing value markers (<NA>, <NaN>) are optionally appended at the
end (controlled via include_na). Literal strings "NA", "NaN", and
"" are quoted to distinguish them from missing markers.
Class: the class of each variable (possibly multiple, e.g.
"labelled", "numeric")
N_distinct: number of distinct non-missing values
N_valid: number of non-missing observations
NAs: number of missing observations
For matrix and array columns, observations are counted per row:
a row is treated as missing if any of its cells is NA. N_valid
/ NAs therefore count complete vs. incomplete rows, not
individual cells.
If tbl = TRUE, the tibble is returned. If tbl = FALSE and the session is
interactive, the summary is displayed in the Viewer pane and the function
returns invisibly. In non-interactive sessions, a message is displayed and
the function returns invisibly.
A data frame, or a transformation of one.
Optional tidyselect-style column selectors (e.g.
starts_with("var"), where(is.numeric), etc.). Columns can be selected
or reordered, but renaming selections is not supported.
Logical. If FALSE (the default), displays a compact summary
of the variable's values. For numeric, character, date/time, labelled, and
factor variables, all unique non-missing values are shown when there are
at most four; otherwise the first three values, an ellipsis (...), and
the last value are shown. Values are sorted when appropriate (e.g.,
numeric, character, date).
For factors, factor_levels controls whether observed or all declared
levels are shown; level order is preserved.
For labelled variables, prefixed labels are displayed via
labelled::to_factor(levels = "prefixed").
If TRUE, all unique non-missing values are displayed.
Logical. If FALSE (the default), opens the summary in the Viewer
if the session is interactive. If TRUE, returns a tibble.
Logical. If TRUE, unique missing value markers
(<NA>, <NaN>) are explicitly appended at the end of the Values
summary when present in the variable. This applies to all variable types.
Literal strings "NA", "NaN", and "" are quoted to distinguish them
from missing markers. If FALSE (the default), missing values are omitted
from Values but still counted in the NAs column.
Character. Controls how factor values are displayed
in Values. "observed" (the default; code_book() uses "all")
shows only levels present in the data, preserving factor level order.
"all" shows all declared levels, including unused levels.
The function can also apply tidyselect-style variable selectors to select or reorder columns dynamically.
If used interactively (e.g. in RStudio or Positron), the summary is
displayed in the Viewer pane with a contextual title like vl: sochealth.
If the data frame has been transformed or subsetted, the title will display
an asterisk (*), e.g. vl: sochealth*. Anonymous or ambiguous calls use
vl: <data>.
For factor variables, varlist() defaults to displaying only the levels
observed in the data (factor_levels = "observed") — a reflection of what
is actually present. By contrast, code_book() defaults to "all" to
document the declared schema, including unused levels. Pass factor_levels
explicitly to override either default.
Other variable inspection:
code_book(),
label_from_names()
varlist(sochealth, tbl = TRUE)
sochealth |> varlist(tbl = TRUE)
varlist(sochealth, where(is.numeric), values = TRUE, tbl = TRUE)
varlist(
sochealth,
starts_with("bmi"),
values = TRUE,
include_na = TRUE,
tbl = TRUE
)
df <- data.frame(
group = factor(c("A", "B", NA), levels = c("A", "B", "C"))
)
varlist(
df,
values = TRUE,
include_na = TRUE,
factor_levels = "all",
tbl = TRUE
)
vl(sochealth, tbl = TRUE)
sochealth |> vl(tbl = TRUE)
vl(sochealth, starts_with("bmi"), tbl = TRUE)
vl(sochealth, where(is.numeric), values = TRUE, tbl = TRUE)
Run the code above in your browser using DataLab