scf_MIcombine: Combine Estimates Across SCF Implicates Using Rubin's Rules

Description

This function is the canonical implementation of Rubin’s Rules in the scf package. It defines how point estimates, standard errors, and degrees of freedom are pooled across the SCF’s multiply-imputed replicate-weighted implicates.

Usage

scf_MIcombine(results, variances, call = sys.call(), df.complete = Inf)

Value

An object of class "scf_MIresult" with components:

coefficients: Pooled point estimates across implicates.
variance: Pooled variance-covariance matrix.
df: Degrees of freedom for each parameter, adjusted using Barnard-Rubin formula.
missinfo: Estimated fraction of missing information for each parameter.
nimp: Number of implicates used in pooling.
call: Function call recorded for reproducibility.

Supports coef(), SE(), confint(), and summary() methods.

Arguments

results: A list of implicate-level model outputs. Each element must be a named numeric vector or an object with methods for coef() and vcov(). Typically generated internally by modeling functions.
variances: Optional list of variance-covariance matrices. If omitted, extracted using vcov().
call: Optional. The originating function call. Defaults to sys.call().
df.complete: Optional degrees of freedom for the complete-data model. Used for small-sample corrections. Defaults to Inf, assuming large-sample asymptotics.

Implementation

scf_MIcombine() pools a set of implicate-level estimates and their associated variance-covariance matrices using Rubin’s Rules.

This includes:

Calculation of pooled point estimates
Total variance from within- and between-imputation components
Degrees of freedom via Barnard-Rubin method
Fraction of missing information

Inputs are typically produced by functions like scf_mean(), scf_ols(), or scf_percentile().

This function is primarily used internally, but may be called directly by advanced users constructing custom estimation routines from implicate-level results.

Details

The SCF provides five implicates per survey wave, each a plausible version of the population under a specific missing-data model. Analysts conduct the same statistical procedure on each implicate, producing a set of five estimates $ Q_1, Q_2, ..., Q_5 $. These are then combined using Rubin’s Rules, a procedure to combine results across these implicates with an attempt to account for:

Within-imputation variance: Uncertainty from complex sample design
Between-imputation variance: Uncertainty due to missing data

For a scalar quantity $ Q $, the pooled estimate and total variance are calculated as:

$$ \bar{Q} = \frac{1}{M} \sum Q_m $$ $$ \bar{U} = \frac{1}{M} \sum U_m $$ $$ B = \frac{1}{M - 1} \sum (Q_m - \bar{Q})^2 $$ $$ T = \bar{U} + \left(1 + \frac{1}{M} \right) B $$

Where:

$ M $ is the number of implicates (typically 5 for SCF)
$ Q_m $ is the estimate from implicate $ m $
$ U_m $ is the sampling variance of $ Q_m $, accounting for replicate weights and design

The total variance $ T $ reflects both within-imputation uncertainty (sampling error) and between-imputation uncertainty (missing-data imputation).

The standard error of the pooled estimate is $ \sqrt{T} $. Degrees of freedom are adjusted using the Barnard-Rubin method:

$$ \nu = (M - 1) \left(1 + \frac{\bar{U}}{(1 + \frac{1}{M}) B} \right)^2 $$

The fraction of missing information (FMI) is also reported: it reflects the proportion of total variance attributable to imputation uncertainty.

See scf_MIcombine() for full implementation details.

References

Barnard J, Rubin DB. Small-sample degrees of freedom with multiple imputation. tools:::Rd_expr_doi("10.1093/biomet/86.4.948").

Little RJA, Rubin DB. Statistical analysis with missing data. ISBN: 9780470526798.

U.S. Federal Reserve. Codebook for 2022 Survey of Consumer Finances. https://www.federalreserve.gov/econres/scfindex.htm

Examples

Run this code

# Do not implement these lines in real analysis:
# Use functions `scf_download()` and `scf_load()`
td  <- tempdir()
src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")
file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)
scf2022 <- scf_load(2022, data_directory = td)

# Example for real analysis: Pool simple survey mean for mock data
outlist <- lapply(scf2022$mi_design, function(d) survey::svymean(~I(age >= 65), d))
pooled  <- scf_MIcombine(outlist)     # vcov/coef extracted automatically
SE(pooled); coef(pooled)

unlink("scf2022.rds", force = TRUE)

Run the code above in your browser using DataLab