Compute the design-based estimate of the finite-population variance for one
or more numeric variables in a survey design, with optional grouping,
uncertainty quantification, and metadata-driven labelling. Matches
survey::svyvar() numerically (Kish n/(n-1) correction) on Taylor,
replicate, twophase, and nonprob designs.
get_variance(
design,
x,
group = NULL,
variance = "ci",
conf_level = 0.95,
n_weighted = FALSE,
decimals = NULL,
min_cell_n = 30L,
na.rm = TRUE,
na_handling = c("pairwise", "listwise"),
label_values = TRUE,
label_vars = TRUE,
name_style = "surveycore",
...,
.id = NULL,
.if_missing_var = NULL
)A survey_variance tibble (also inheriting survey_result).
Columns, in order:
[group_cols...] — group variable columns (when active), first.
name — focal variable name (or its label when label_vars = TRUE).
variance — design-based point estimate of the finite-population
variance. NaN for degenerate cells; exact 0 for constant-in-domain
variables.
Uncertainty columns (se, var, cv, ci_low, ci_high,
moe, deff) — only those requested via variance.
n — unweighted count of non-NA observations used.
n_weighted — sum of weights (only when n_weighted = TRUE).
A survey design object: survey_taylor, survey_replicate,
survey_twophase, or survey_nonprob. Also accepts a
survey_collection.
<tidy-select> One or more unquoted
numeric variable names. Must resolve to at least one numeric column;
non-numeric columns are rejected (no silent drop).
<tidy-select> Optional grouping
variable(s). Combined with any grouping set by group_by(). Default
NULL.
NULL or a character vector of one or more of "se",
"ci", "var", "cv", "moe", "deff". Controls which uncertainty
columns appear in the output. Default "ci".
Numeric scalar in (0, 1). Confidence level for intervals.
Default 0.95.
Logical. If TRUE, add an n_weighted column with the
sum of weights for non-NA, positive-weight observations in each row's
estimate. Default FALSE.
Integer or NULL. If an integer, rounds all numeric output
columns to this many decimal places. Default NULL (no rounding).
Integer. Minimum unweighted cell count before
surveycore_warning_small_cell fires. Default 30L (AAPOR guidance).
Logical. If TRUE (default), NA values in the focal
variable are excluded from the estimate and rows with NA in any
grouping variable are excluded from the output. If FALSE, NA
propagates to produce NaN estimates.
"pairwise" (default) or "listwise". In
multi-variable mode controls whether each focal variable uses its own
complete-case set ("pairwise") or the intersection across all focal
variables ("listwise"). Ignored when na.rm = FALSE.
Logical. Accepted for API uniformity; used to
convert grouping-variable codes to value labels. Default TRUE.
Logical. If TRUE (default), the name column
shows variable labels when available (falling back to raw names).
"surveycore" (default) or "broom". Under "broom",
renames variance → estimate, se → std.error, ci_low →
conf.low, ci_high → conf.high.
Unused. Reserved so that .id and .if_missing_var remain
named-only when a survey_collection is passed as design.
Character(1) or NULL. Column name used to identify each
survey when design is a survey_collection. For collection inputs,
NULL (the default) resolves to the collection's stored @id property.
Pass a non-NULL value to override. Ignored when design is a single
survey.
"error", "skip", or NULL. How to handle
surveys in a collection that lack one of the requested NSE variables.
For collection inputs, NULL (the default) resolves to the collection's
stored @if_missing_var property. Pass a non-NULL value to override.
Ignored when design is a single survey.
Confidence intervals use the normal-Wald approximation on the SE of the
variance estimate: ci_low = variance - z * se,
ci_high = variance + z * se, where z = qnorm((1 + conf_level) / 2).
The bounds are not clamped. When the true variance is near zero with
wide SE, ci_low may be negative. Users who want non-negative lower
bounds can clamp at 0 post-hoc. This behaviour matches
survey::svyvar().
Under na_handling = "pairwise" (the default), each focal variable
contributes its own per-variable complete-case count to n. Under
na_handling = "listwise", every output row shares the intersection
complete-case count — rows with NA in any selected variable are
excluded from every variable's calculation.
Other analysis:
clean(),
get_anova(),
get_corr(),
get_covariance(),
get_diffs(),
get_freqs(),
get_means(),
get_pairwise(),
get_quantiles(),
get_ratios(),
get_t_test(),
get_totals(),
meta()
d <- as_survey(
nhanes_2017,
ids = sdmvpsu,
weights = wtint2yr,
strata = sdmvstra,
nest = TRUE
)
get_variance(d, ridageyr)
# Multiple variables
get_variance(d, c(ridageyr, bpxsy1))
# With grouping
get_variance(d, ridageyr, group = riagendr)
Run the code above in your browser using DataLab