scf_MIcombine: Combine Estimates Across SCF Implicates Using Rubin's Rules

Description

This function implements Rubin’s Rules for combining multiply-imputed survey model results in the scf package. It pools point estimates, variance-covariance matrices, and degrees of freedom across the SCF’s five implicates.

Usage

scf_MIcombine(results, variances, call = sys.call(), df.complete = Inf)

Value

An object of class "scf_MIresult" with components:

coefficients: Pooled point estimates across implicates.
variance: Pooled variance-covariance matrix.
df: Degrees of freedom for each parameter, adjusted using Barnard-Rubin formula.
missinfo: Estimated fraction of missing information for each parameter.
nimp: Number of implicates used in pooling.
call: Function call recorded for reproducibility.

Supports coef(), SE(), confint(), and summary() methods.

Arguments

results: A list of implicate-level model outputs. Each element must be a named numeric vector or an object with methods for coef() and vcov(). Typically generated internally by modeling functions.
variances: Optional list of variance-covariance matrices. If omitted, extracted using vcov().
call: Optional. The originating function call. Defaults to sys.call().
df.complete: Optional degrees of freedom for the complete-data model. Used for small-sample corrections. Defaults to Inf, assuming large-sample asymptotics.

Scope

scf_MIcombine() is used for model-based analyses such as scf_ols(), scf_glm(), and scf_logit(), where each implicate’s model output includes both parameter estimates and replicate-weighted sampling variances.

Descriptive estimators—functions such as scf_mean(), scf_percentile(), and scf_median()—do not apply Rubin’s Rules. They follow the Survey of Consumer Finances convention used in the Federal Reserve Board’s SAS macro, combining (i) the replicate-weight sampling variance from implicate 1 with (ii) the between-implicate variance scaled by (m + 1)/m.

This separation is intentional: descriptive statistics in scf aim to reproduce the Survey of Consumer Finances' published standard errors, whereas model-based functions use Rubin's Rules.

Implementation

scf_MIcombine() pools a set of implicate-level estimates and their associated variance-covariance matrices using Rubin’s Rules.

This includes:

Calculation of pooled point estimates
Total variance from within- and between-imputation components
Degrees of freedom via Barnard-Rubin method
Fraction of missing information

Inputs are typically produced by modeling functions such as scf_ols(), scf_glm(), or scf_logit(), which return implicate-level coefficient vectors and variance-covariance matrices.

This function is primarily used internally, but may be called directly by advanced users constructing custom estimation routines from implicate-level results.

Details

The SCF provides five implicates per survey wave, each a plausible version of the population under a specific missing-data model. Analysts conduct the same statistical procedure on each implicate, producing a set of five estimates $ Q_1, Q_2, ..., Q_5 $. These are then combined using Rubin’s Rules, a procedure to combine results across these implicates with an attempt to account for:

Within-imputation variance: Uncertainty from complex sample design
Between-imputation variance: Uncertainty due to missing data

For a scalar quantity $ Q $, the pooled estimate and total variance are calculated as:

$$ \bar{Q} = \frac{1}{M} \sum Q_m $$ $$ \bar{U} = \frac{1}{M} \sum U_m $$ $$ B = \frac{1}{M - 1} \sum (Q_m - \bar{Q})^2 $$ $$ T = \bar{U} + \left(1 + \frac{1}{M} \right) B $$

Where:

$ M $ is the number of implicates (typically 5 for SCF)
$ Q_m $ is the estimate from implicate $ m $
$ U_m $ is the sampling variance of $ Q_m $, accounting for replicate weights and design

The total variance $ T $ reflects both within-imputation uncertainty (sampling error) and between-imputation uncertainty (missing-data imputation).

The standard error of the pooled estimate is $ \sqrt{T} $. Degrees of freedom are adjusted using the Barnard-Rubin method:

$$ \nu = (M - 1) \left(1 + \frac{\bar{U}}{(1 + \frac{1}{M}) B} \right)^2 $$

The fraction of missing information (FMI) is also reported: it reflects the proportion of total variance attributable to imputation uncertainty.

References

Barnard J, Rubin DB. Small-sample degrees of freedom with multiple imputation. tools:::Rd_expr_doi("10.1093/biomet/86.4.948").

Little RJA, Rubin DB. Statistical analysis with missing data. ISBN: 9780470526798.

U.S. Federal Reserve. Codebook for 2022 Survey of Consumer Finances. https://www.federalreserve.gov/econres/scfindex.htm

Examples

Run this code

# Do not implement these lines in real analysis:
# Use functions `scf_download()` and `scf_load()`
td <- tempfile("MIcombine_")
dir.create(td)

src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")
file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)
scf2022 <- scf_load(2022, data_directory = td)

# Example for real analysis: Pool simple survey mean for mock data
outlist <- lapply(scf2022$mi_design, function(d) survey::svymean(~I(age >= 65), d))
pooled  <- scf_MIcombine(outlist)     # vcov/coef extracted automatically
SE(pooled); coef(pooled)

unlink(td, recursive = TRUE, force = TRUE)

Run the code above in your browser using DataLab