scf_percentile: Estimate Percentiles in SCF Microdata

Description

This function estimates a weighted percentile of a continuous variable in the Survey of Consumer Finances (SCF). It reproduces the procedure used in the Federal Reserve Board's published SCF Bulletin SAS macro for distributional statistics (Federal Reserve Board 2023c). This convention is specific to SCF descriptive distributional statistics (quantiles, proportions) and differs from standard handling (i.e., using Rubin's Rule).

Usage

scf_percentile(scf, var, q = 0.5, by = NULL, verbose = FALSE)

Value

An object of class "scf_percentile" containing:

results: A data frame containing pooled percentile estimates, pooled standard errors, and implicate min/max values. One row per group (if by is supplied) or one row otherwise.
imps: A list of implicate-level percentile estimates and standard errors.
aux: A list containing the variable name, optional group variable name, and the quantile requested.
verbose: Logical flag indicating whether implicate-level estimates should be printed by print() or summary().

Arguments

scf: A scf_mi_survey object created with scf_load(). Must contain the list of replicate-weighted designs for each implicate in scf$mi_design.
var: A one-sided formula naming the continuous variable to summarize (for example ~networth).
q: Numeric percentile in between 0 and 1. Default 0.5 (median).
by: Optional one-sided formula naming a categorical grouping variable. If supplied, the percentile is estimated separately within each group.
verbose: Logical. If TRUE, include implicate-level estimates in the returned object for inspection. Default FALSE.

Details

The operation to render the estimates:

For each implicate, estimate the requested percentile using survey::svyquantile() with se = TRUE.
The reported point estimate is the mean of the M implicate-specific percentile estimates.
The standard error follows the SCF Bulletin SAS macro convention:
```
V_total = V1 + ((M + 1) / M) * B
```
where:
- V1 is the replicate-weight sampling variance of the percentile from the first implicate only.
- B is the between-implicate variance of the percentile estimates.
The reported standard error is sqrt(V_total).
If a grouping variable is supplied, the same logic is applied separately within each group.

References

Federal Reserve Board. 2023c. "SAS Macro: Variable Definitions." https://www.federalreserve.gov/econres/files/bulletin.macro.txt

Examples

Run this code

# Do not implement these lines in real analysis:
# Use functions `scf_download()` and `scf_load()` for actual SCF data
td <- tempfile("percentile_")
dir.create(td)

src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")
file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)
scf2022 <- scf_load(2022, data_directory = td)

# Estimate the 75th percentile of net worth
scf_percentile(scf2022, ~networth, q = 0.75)

# Estimate the median net worth by ownership group
scf_percentile(scf2022, ~networth, q = 0.5, by = ~own)

# Do not implement these lines in real analysis: Cleanup for package check
unlink(td, recursive = TRUE, force = TRUE)
rm(scf2022)

Run the code above in your browser using DataLab