get_variance: Design-Based Population Variance for a Survey Design

Description

Compute the design-based estimate of the finite-population variance for one or more numeric variables in a survey design, with optional grouping, uncertainty quantification, and metadata-driven labelling. Matches survey::svyvar() numerically (Kish n/(n-1) correction) on Taylor, replicate, twophase, and nonprob designs.

Usage

get_variance(
  design,
  x,
  group = NULL,
  variance = "ci",
  conf_level = 0.95,
  n_weighted = FALSE,
  decimals = NULL,
  min_cell_n = 30L,
  na.rm = TRUE,
  na_handling = c("pairwise", "listwise"),
  label_values = TRUE,
  label_vars = TRUE,
  name_style = "surveycore",
  ...,
  .id = NULL,
  .if_missing_var = NULL
)

Value

A survey_variance tibble (also inheriting survey_result). Columns, in order:

[group_cols...] — group variable columns (when active), first.
name — focal variable name (or its label when label_vars = TRUE).
variance — design-based point estimate of the finite-population variance. NaN for degenerate cells; exact 0 for constant-in-domain variables.
Uncertainty columns (se, var, cv, ci_low, ci_high, moe, deff) — only those requested via variance.
n — unweighted count of non-NA observations used.
n_weighted — sum of weights (only when n_weighted = TRUE).

Arguments

design: A survey design object: survey_taylor, survey_replicate, survey_twophase, or survey_nonprob. Also accepts a survey_collection.
x: <tidy-select> One or more unquoted numeric variable names. Must resolve to at least one numeric column; non-numeric columns are rejected (no silent drop).
group: <tidy-select> Optional grouping variable(s). Combined with any grouping set by group_by(). Default NULL.
variance: NULL or a character vector of one or more of "se", "ci", "var", "cv", "moe", "deff". Controls which uncertainty columns appear in the output. Default "ci".
conf_level: Numeric scalar in (0, 1). Confidence level for intervals. Default 0.95.
n_weighted: Logical. If TRUE, add an n_weighted column with the sum of weights for non-NA, positive-weight observations in each row's estimate. Default FALSE.
decimals: Integer or NULL. If an integer, rounds all numeric output columns to this many decimal places. Default NULL (no rounding).
min_cell_n: Integer. Minimum unweighted cell count before surveycore_warning_small_cell fires. Default 30L (AAPOR guidance).
na.rm: Logical. If TRUE (default), NA values in the focal variable are excluded from the estimate and rows with NA in any grouping variable are excluded from the output. If FALSE, NA propagates to produce NaN estimates.
na_handling: "pairwise" (default) or "listwise". In multi-variable mode controls whether each focal variable uses its own complete-case set ("pairwise") or the intersection across all focal variables ("listwise"). Ignored when na.rm = FALSE.
label_values: Logical. Accepted for API uniformity; used to convert grouping-variable codes to value labels. Default TRUE.
label_vars: Logical. If TRUE (default), the name column shows variable labels when available (falling back to raw names).
name_style: "surveycore" (default) or "broom". Under "broom", renames variance → estimate, se → std.error, ci_low → conf.low, ci_high → conf.high.
...: Unused. Reserved so that .id and .if_missing_var remain named-only when a survey_collection is passed as design.
.id: Character(1) or NULL. Column name used to identify each survey when design is a survey_collection. For collection inputs, NULL (the default) resolves to the collection's stored @id property. Pass a non-NULL value to override. Ignored when design is a single survey.
.if_missing_var: "error", "skip", or NULL. How to handle surveys in a collection that lack one of the requested NSE variables. For collection inputs, NULL (the default) resolves to the collection's stored @if_missing_var property. Pass a non-NULL value to override. Ignored when design is a single survey.

Details

Confidence intervals use the normal-Wald approximation on the SE of the variance estimate: ci_low = variance - z * se, ci_high = variance + z * se, where z = qnorm((1 + conf_level) / 2). The bounds are not clamped. When the true variance is near zero with wide SE, ci_low may be negative. Users who want non-negative lower bounds can clamp at 0 post-hoc. This behaviour matches survey::svyvar().

Under na_handling = "pairwise" (the default), each focal variable contributes its own per-variable complete-case count to n. Under na_handling = "listwise", every output row shares the intersection complete-case count — rows with NA in any selected variable are excluded from every variable's calculation.

Examples

Run this code

d <- as_survey(
  nhanes_2017,
  ids = sdmvpsu,
  weights = wtint2yr,
  strata = sdmvstra,
  nest = TRUE
)
get_variance(d, ridageyr)

# Multiple variables
get_variance(d, c(ridageyr, bpxsy1))

# With grouping
get_variance(d, ridageyr, group = riagendr)