Learn R Programming

scf (version 1.0.5)

scf_MIcombine: Combine Estimates Across SCF Implicates Using Rubin's Rules

Description

This function implements Rubin’s Rules for combining multiply-imputed survey model results in the scf package. It pools point estimates, variance-covariance matrices, and degrees of freedom across the SCF’s five implicates.

Usage

scf_MIcombine(results, variances, call = sys.call(), df.complete = Inf)

Value

An object of class "scf_MIresult" with components:

coefficients

Pooled point estimates across implicates.

variance

Pooled variance-covariance matrix.

df

Degrees of freedom for each parameter, adjusted using Barnard-Rubin formula.

missinfo

Estimated fraction of missing information for each parameter.

nimp

Number of implicates used in pooling.

call

Function call recorded for reproducibility.

Supports coef(), SE(), confint(), and summary() methods.

Arguments

results

A list of implicate-level model outputs. Each element must be a named numeric vector or an object with methods for coef() and vcov(). Typically generated internally by modeling functions.

variances

Optional list of variance-covariance matrices. If omitted, extracted using vcov().

call

Optional. The originating function call. Defaults to sys.call().

df.complete

Optional degrees of freedom for the complete-data model. Used for small-sample corrections. Defaults to Inf, assuming large-sample asymptotics.

Scope

scf_MIcombine() is used for model-based analyses such as scf_ols(), scf_glm(), and scf_logit(), where each implicate’s model output includes both parameter estimates and replicate-weighted sampling variances.

Descriptive estimators—functions such as scf_mean(), scf_percentile(), and scf_median()—do not apply Rubin’s Rules. They follow the Survey of Consumer Finances convention used in the Federal Reserve Board’s SAS macro, combining (i) the replicate-weight sampling variance from implicate 1 with (ii) the between-implicate variance scaled by (m + 1)/m.

This separation is intentional: descriptive statistics in scf aim to reproduce the Survey of Consumer Finances' published standard errors, whereas model-based functions use Rubin's Rules.

Implementation

scf_MIcombine() pools a set of implicate-level estimates and their associated variance-covariance matrices using Rubin’s Rules.

This includes:

  • Calculation of pooled point estimates

  • Total variance from within- and between-imputation components

  • Degrees of freedom via Barnard-Rubin method

  • Fraction of missing information

Inputs are typically produced by modeling functions such as scf_ols(), scf_glm(), or scf_logit(), which return implicate-level coefficient vectors and variance-covariance matrices.

This function is primarily used internally, but may be called directly by advanced users constructing custom estimation routines from implicate-level results.

Details

The SCF provides five implicates per survey wave, each a plausible version of the population under a specific missing-data model. Analysts conduct the same statistical procedure on each implicate, producing a set of five estimates \( Q_1, Q_2, ..., Q_5 \). These are then combined using Rubin’s Rules, a procedure to combine results across these implicates with an attempt to account for:

  • Within-imputation variance: Uncertainty from complex sample design

  • Between-imputation variance: Uncertainty due to missing data

For a scalar quantity \( Q \), the pooled estimate and total variance are calculated as:

$$ \bar{Q} = \frac{1}{M} \sum Q_m $$ $$ \bar{U} = \frac{1}{M} \sum U_m $$ $$ B = \frac{1}{M - 1} \sum (Q_m - \bar{Q})^2 $$ $$ T = \bar{U} + \left(1 + \frac{1}{M} \right) B $$

Where:

  • \( M \) is the number of implicates (typically 5 for SCF)

  • \( Q_m \) is the estimate from implicate \( m \)

  • \( U_m \) is the sampling variance of \( Q_m \), accounting for replicate weights and design

The total variance \( T \) reflects both within-imputation uncertainty (sampling error) and between-imputation uncertainty (missing-data imputation).

The standard error of the pooled estimate is \( \sqrt{T} \). Degrees of freedom are adjusted using the Barnard-Rubin method:

$$ \nu = (M - 1) \left(1 + \frac{\bar{U}}{(1 + \frac{1}{M}) B} \right)^2 $$

The fraction of missing information (FMI) is also reported: it reflects the proportion of total variance attributable to imputation uncertainty.

References

Barnard J, Rubin DB. Small-sample degrees of freedom with multiple imputation. tools:::Rd_expr_doi("10.1093/biomet/86.4.948").

Little RJA, Rubin DB. Statistical analysis with missing data. ISBN: 9780470526798.

U.S. Federal Reserve. Codebook for 2022 Survey of Consumer Finances. https://www.federalreserve.gov/econres/scfindex.htm

Examples

Run this code
# Do not implement these lines in real analysis:
# Use functions `scf_download()` and `scf_load()`
td <- tempfile("MIcombine_")
dir.create(td)

src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")
file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)
scf2022 <- scf_load(2022, data_directory = td)

# Example for real analysis: Pool simple survey mean for mock data
outlist <- lapply(scf2022$mi_design, function(d) survey::svymean(~I(age >= 65), d))
pooled  <- scf_MIcombine(outlist)     # vcov/coef extracted automatically
SE(pooled); coef(pooled)

unlink(td, recursive = TRUE, force = TRUE)

Run the code above in your browser using DataLab