Compute the design-based estimate of the finite-population Pearson
covariance for every (unordered, by default) pair of numeric variables
selected from x, with optional grouping, uncertainty quantification,
and metadata-driven labelling. Matches the off-diagonal entries of
survey::svyvar() (Kish n/(n-1) correction) on Taylor, replicate,
twophase, and nonprob designs at numerical parity.
get_covariance(
design,
x,
group = NULL,
redundant = FALSE,
diagonal = FALSE,
variance = "ci",
conf_level = 0.95,
n_weighted = FALSE,
decimals = NULL,
min_cell_n = 30L,
na.rm = TRUE,
label_values = TRUE,
label_vars = TRUE,
name_style = "surveycore",
...,
.id = NULL,
.if_missing_var = NULL
)A survey_covariance tibble (also inheriting survey_result).
Columns, in order:
[group_cols...] — group variable columns (when active), first.
var1, var2 — factor columns identifying the pair (levels in
x-supply order).
covariance — design-based Pearson covariance estimate
(Kish-corrected). NaN for degenerate cells; 0 for pairs where
at least one variable is constant on the active domain.
Uncertainty columns (se, var, cv, ci_low, ci_high,
moe, deff) — only those requested via variance.
n — pairwise unweighted count.
n_weighted — pair's sum of weights (only when requested).
A survey design object: survey_taylor, survey_replicate,
survey_twophase, or survey_nonprob. Also accepts a
survey_collection.
<tidy-select> Two or more unquoted
variable names. Must resolve to at least two columns. Non-numeric
columns are dropped with a warning; if fewer than 2 numeric variables
remain, an error is raised.
<tidy-select> Optional grouping
variable(s). Combined with any grouping set by group_by(). Default
NULL. Covariances are estimated separately within each group using
that group's own weighted means for centring.
Logical. If FALSE (default), each unordered pair
appears once in supply order (lower-triangle). If TRUE, both
(A, B) and (B, A) are emitted.
Logical. If FALSE (default), self-pairs (x, x) are
excluded. If TRUE, one self-pair per variable is emitted with
covariance = \eqn{\widehat{\mathrm{Var}}(x)}{Var_hat(x)} (the
design-based variance -- not 1).
NULL or a character vector of one or more of "se",
"ci", "var", "cv", "moe", "deff". Default "ci".
Numeric scalar in (0, 1). Default 0.95.
Logical. If TRUE, append an n_weighted column
with the pair's pairwise-complete sum of weights. Default FALSE.
Integer or NULL. If integer, rounds all numeric
output columns to this many places. Default NULL (no rounding).
Integer. Minimum pairwise unweighted count before
surveycore_warning_small_cell fires. Default 30L (AAPOR).
Logical. If TRUE (default), pairwise-complete deletion
per pair, and rows with NA in any group variable are excluded from
the output. If FALSE, NAs propagate to produce NaN estimates;
NA group values are retained as their own group row.
Logical. If TRUE (default) and the grouping
variable has value labels, the group column is converted to a
labelled factor.
Logical. If TRUE (default) and variable labels are
set in metadata, var1 and var2 show labels instead of raw names.
"surveycore" (default) or "broom". Under
"broom", renames covariance -> estimate, se -> std.error,
ci_low -> conf.low, ci_high -> conf.high.
Unused. Reserved so that .id and .if_missing_var remain
named-only when a survey_collection is passed as design.
Character(1) or NULL. Column name used to identify each
survey when design is a survey_collection. For collection inputs,
NULL (the default) resolves to the collection's stored @id property.
Pass a non-NULL value to override. Ignored when design is a single
survey.
"error", "skip", or NULL. How to handle
surveys in a collection that lack one of the requested NSE variables.
For collection inputs, NULL (the default) resolves to the collection's
stored @if_missing_var property. Pass a non-NULL value to override.
Ignored when design is a single survey.
Confidence intervals use the normal-Wald approximation on the SE of the
covariance estimate: ci_low = covariance - z * se,
ci_high = covariance + z * se, where z = qnorm((1 + conf_level) / 2).
The bounds are not clamped. Covariance is unbounded — ci_low and
ci_high may have opposite signs and may cross zero. Users who want
clamped intervals can post-process. This behaviour matches
survey::svyvar().
NA handling is pairwise-complete per pair: each ordered pair drops
rows where either variable is NA. There is no na_handling argument;
pairwise is the only policy. This matches survey::svyvar() off-diagonal
pair-at-a-time semantics, not svyvar()'s default listwise deletion
across a multi-variable formula. Numerical parity therefore only holds
when oracle calls are made pair-at-a-time
(survey::svyvar(~x + y, design) per pair).
Under diagonal = TRUE, the self-pair (x, x) returns the design-based
Kish-corrected variance of x on the active domain — not 1 as in
get_corr(). The covariance matrix diagonal is the variance vector, not
the identity. The diagonal-parity gate guarantees that
get_covariance(d, c(x, x), diagonal = TRUE)$covariance and $se equal
get_variance(d, x)$variance and $se numerically (point at 1e-10,
SE at 1e-8) when the active domains match.
Design effect (deff) uses the Goodnight / Mood-Graybill SRS reference
SE_SRS(cov) = sqrt((Var(x) * Var(y) + cov^2) / (n - 1)). When both
the design SE and SRS SE are zero (constant-variable pairs), deff is
set to exactly 0 (0 / 0 guard).
Mood, A. M., Graybill, F. A., & Boes, D. C. (1974). Introduction to the Theory of Statistics (3rd ed.). McGraw-Hill.
Lumley, T. (2010). Complex Surveys: A Guide to Analysis Using R. Wiley.
Cochran, W. G. (1977). Sampling Techniques (3rd ed.). Wiley.
Demnati, A., & Rao, J. N. K. (2004). Linearization variance estimators for survey data. Survey Methodology, 30, 17--26.
Other analysis:
clean(),
get_anova(),
get_corr(),
get_diffs(),
get_freqs(),
get_means(),
get_pairwise(),
get_quantiles(),
get_ratios(),
get_t_test(),
get_totals(),
get_variance(),
meta()
d <- as_survey(
nhanes_2017,
ids = sdmvpsu,
weights = wtint2yr,
strata = sdmvstra,
nest = TRUE
)
get_covariance(d, x = c(ridageyr, bpxsy1))
# Include the diagonal (self-pairs return Var(x), not 1)
get_covariance(d, x = c(ridageyr, bpxsy1), diagonal = TRUE)
# With grouping
get_covariance(d, x = c(ridageyr, bpxsy1), group = riagendr)
Run the code above in your browser using DataLab