Compute weighted proportions (percentages) for one or more categorical variables in a survey design, with optional grouping, uncertainty quantification, and metadata-driven labelling.
get_freqs(
design,
x,
...,
group = NULL,
names_to = "name",
values_to = "value",
variance = NULL,
conf_level = 0.95,
n_weighted = FALSE,
decimals = NULL,
min_cell_n = 30L,
na.rm = TRUE,
label_values = TRUE,
label_vars = TRUE,
name_style = "surveycore",
.id = NULL,
.if_missing_var = NULL
)A survey_freqs tibble (also inheriting survey_result). Columns:
[group_cols...] — group variable columns (when active), first.
[variable_name] (single) or [names_to] + [values_to] (multi).
pct — weighted proportion (0–1).
Variance columns (se, var, cv, ci_low, ci_high, moe,
deff) — only those requested via variance.
n — unweighted cell count (sample basis of each estimate).
n_weighted — estimated population count (only when requested).
Use meta(result) to access design type, variable labels, value labels,
and other metadata.
A survey design object: survey_taylor, survey_replicate,
survey_twophase, or survey_nonprob.
<tidy-select> One or more categorical
variables. Bare names or tidy-select helpers (e.g., c(q1, q2, q3)).
When two or more variables are selected, multi-variable stacking mode
is activated (see Details).
Additional arguments passed to tidy-select (future-proof; currently unused).
<tidy-select> Optional grouping
variable(s). Combined with any grouping set by group_by(). Default
NULL.
Character(1). Column name for the variable identifier in
multi-variable mode. Default "name".
Character(1). Column name for the response value in
multi-variable mode. Default "value".
NULL or a character vector of one or more of "se",
"ci", "var", "cv", "moe", "deff". Controls which uncertainty
columns appear in the output. Default NULL (no uncertainty columns).
Numeric scalar in (0, 1). Confidence level for intervals.
Default 0.95.
Logical. If TRUE, add an n_weighted column with the
sum of weights (estimated population count) per cell. Default FALSE.
Integer or NULL. If an integer, rounds all numeric output
columns (e.g., pct, se, ci_low, ci_high) to this many decimal
places. Default NULL (no rounding).
Integer. Minimum unweighted cell count before
surveycore_warning_small_cell fires. Default 30L (AAPOR guidance).
Logical. If TRUE (default), NA values are excluded from
analysis: observations where the focal variable is NA are dropped from
frequency counts, and observations where any group variable is NA are
excluded from the output. If FALSE, NA values in the focal variable
appear as a dedicated frequency row in the output (not merely counted),
and observations where a group variable is NA are collected into their
own group row (appearing after all non-NA group rows).
Logical. If TRUE (default), convert raw variable
values to labels using metadata or haven attributes. Falls back to
raw values when no labels exist.
Logical. If TRUE (default), use variable labels from
metadata in the names_to column (multi-variable mode only). Falls back
to the raw variable name when no label is set.
"surveycore" (default) or "broom". When "broom",
renames pct → estimate, se → std.error, etc.
Character(1) or NULL. Column name used to identify each
survey when design is a survey_collection. For collection inputs,
NULL (the default) resolves to the collection's stored @id property.
Pass a non-NULL value to override. Ignored when design is a single
survey.
"error", "skip", or NULL. How to handle
surveys in a collection that lack one of the requested NSE variables.
For collection inputs, NULL (the default) resolves to the collection's
stored @if_missing_var property. Pass a non-NULL value to override.
Ignored when design is a single survey.
Single-variable mode (when x resolves to exactly one variable):
The focal variable name becomes the first column. Rows follow the factor
level order (if the variable is a factor) or ascending sort order otherwise.
Multi-variable mode (when x resolves to two or more variables):
Results are stacked in long format. The names_to column contains the
variable label (when label_vars = TRUE) or the raw variable name as
fallback. The values_to column contains the response values.
Domain estimation: Proportions use the ratio linearization approach,
equivalent to survey::svymean() on a binary indicator within the active
domain. The full design structure is used for variance estimation — rows are
not physically removed for domain/group subsets.
na.rm = FALSE: NA is appended as the last level. All proportions
(including non-NA levels) have their denominator inflated to include
NA rows, so the pct column sums to 1.
Other analysis:
clean(),
get_anova(),
get_corr(),
get_covariance(),
get_diffs(),
get_means(),
get_pairwise(),
get_quantiles(),
get_ratios(),
get_t_test(),
get_totals(),
get_variance(),
meta()
# NHANES exam weights are 0 for non-examined participants; filter first
nhanes_sub <- nhanes_2017[nhanes_2017$wtmec2yr > 0, ]
d <- as_survey(nhanes_sub, ids = sdmvpsu, weights = wtmec2yr,
strata = sdmvstra, nest = TRUE)
# Single variable
get_freqs(d, riagendr)
# With confidence intervals
get_freqs(d, riagendr, variance = "ci")
# Grouped
get_freqs(d, riagendr, group = sdmvstra)
# Multi-variable (stacked)
get_freqs(d, c(riagendr, ridreth3), names_to = "item", values_to = "value")
Run the code above in your browser using DataLab