Learn R Programming

SCF: An R Package for Analyzing the Survey of Consumer Finances

Overview

The scf R package provides a structured, reproducible, and pedagogically-conscious toolkit for analyzing the U.S. Federal Reserve’s Survey of Consumer Finances (SCF), one of the highest-quality data sources for information on U.S. households’ balance sheets and income statements.

It wraps replicate-weighted, multiply-imputed SCF data into through a custom data object (scf_mi_survey) with which users can implement custom easy-to-use functions for generating proper population estimates for descriptive statistics, hypothesis testing, regression modeling, and high-quality visualizations.

Table of Contents

Features

Data Preparation

  • scf_download(): Downloads and preprocesses SCF microdata, including all five implicates and 999 replicate weights.
  • scf_load(): Loads .rds files into structured scf_mi_survey objects ready for analysis.
  • scf_update(): Adds or transforms variables across implicates.
  • scf_subset(): Subsets the data consistently across all implicates.

Descriptive Statistics

  • scf_freq(): Weighted frequency tables for categorical variables.
  • scf_xtab(): Cross-tabulations by row, column, or cell percentages.
  • scf_mean(), scf_median(), scf_percentile(): Computes groupwise or overall statistics using Rubin’s Rules or a commensurate methodology.
  • scf_corr(): Weighted Pearson correlations.

Statistical Inference

  • scf_ttest(): One-sample and two-sample t-tests for continuous variables.
  • scf_prop_test(): One-sample and two-sample proportion tests for binary variables.
  • scf_MIcombine(): Combines estimates across imputations using Rubin’s Rules (internal to most functions).

Regression Modeling

  • scf_ols(): Linear regression with pooled estimates and implicate diagnostics.
  • scf_glm(): Generalized linear models (e.g., logistic, Poisson).
  • scf_logit(): Wrapper for logistic regression with optional odds ratio output.

All model functions (scf_ols, scf_glm, scf_logit) return objects of class scf_model_result, with methods for coef(), vcov(), predict(), AIC(), residuals(), and summary().

Visualization

  • scf_plot_dist(): Kernel density plots for visualizing and comparing distributions by group.
  • scf_plot_dbar(): Bar plots of categorical variable distributions.
  • scf_plot_bbar(): Stacked bar plots for two categorical variables.
  • scf_plot_cbar(): Bar plots for continuous variable summaries by group.
  • scf_plot_smooth(): Smoothed line plots for continuous distributions.
  • scf_plot_hist(): Weighted histograms of continuous variables.
  • scf_plot_hex(): Weighted hexbin plots for bivariate continuous data.

Diagnostics and Output

  • print(), summary(): Custom methods for clean, interpretable output in analysis and teaching.

Installation

Install the latest version of the package through CRAN:

install.packages("scf")

The package requires R ≥ 3.6 and the following packages:

  • survey (for replicate-weighted designs)
  • ggplot2 (for plotting)
  • httr, haven (for downloading and reading SCF data)
  • mitools, stats, utils, methods, and others (loaded automatically)

Use install.packages() to install any missing dependencies manually if needed.

Getting Started

Download and Load Data

# Download SCF data for 2022:
scf_download(2022)

# Load the data into a survey design object:
scf2022 <- scf_load(2022)
# This document will use mock data for CRAN compliance
# use the above method to download and load data in your analysis instead of:
scf2022 <- readRDS(system.file("extdata", "mock_scf2022.rds", package = "scf"))
# NOTE: Mock data for demonstration only. 
# Use `scf_download()` and `scf_load()` for full SCF datasets.

Explore and Summarize

Univariate Distributions

# Frequency of education categories
scf_freq(scf2022, ~edcl)

# Median household net worth
scf_median(scf2022, ~networth)

# 90th percentile of income
scf_percentile(scf2022, ~income, q = 0.9)

# Histogram of net worth distribution
scf_plot_hist(scf2022, ~networth)

# Smoothed density plot of income
scf_plot_smooth(scf2022, ~income)

Bivariate Relationships

# Cross-tabulation of education and homeownership
scf_xtab(scf2022, ~edcl, ~own)

# Stacked bar chart: homeownership by education
scf_plot_bbar(scf2022, ~edcl, ~own)

# Weighted bar chart: mean net worth by education
scf_plot_cbar(scf2022, ~networth, ~edcl, stat = "mean")

# Grouped median income by race
scf_median(scf2022, ~income, by = ~racecl)

# Correlation between income and net worth
scf_corr(scf2022, ~income, ~networth)

# Hexbin plot: income vs. net worth
scf_plot_hex(scf2022, ~income, ~networth)

Statistical Testing

# One-sample proportion test: Is more than 10% of households rich?
scf_prop_test(scf2022, ~I(networth > 1e6), p = 0.10, alternative = "greater")

# Two-sample proportion test: Are women less likely to be rich?
scf_prop_test(scf2022, ~I(networth > 1e6), ~factor(hhsex, labels = c("Male", "Female")), alternative = "less")

# One-sample t-test: Is mean income different from $75,000?
scf_ttest(scf2022, ~income, mu = 75000)

# Two-sample t-test: Are older households wealthier?
scf_ttest(scf2022, ~networth, ~I(age > 50), alternative = "greater")

Regression Modeling

# Linear regression: Predict net worth from income and education
scf_ols(scf2022, networth ~ income + factor(edcl))

# Generalized linear model: Predict borrowing with logistic regression
scf_glm(scf2022, hborrff ~ income + age + factor(edcl), family = binomial())

# Logit wrapper: Predict probability of owning stocks
scf_logit(scf2022, ~I(owns_stocks == 1) ~ age + income + factor(edcl))

Plotting and Visualization


# Bar chart of a single categorical variable
scf_plot_dbar(scf2022, ~edcl)

# Stacked bar chart comparing education by race
scf_plot_bbar(scf2022, ~edcl, ~racecl, scale = "percent", percent_by = "row")

# Smoothed line plot of net worth distribution
scf_plot_smooth(scf2022, ~networth, xlim = c(0, 2e6), method = "loess")

# Histogram of income distribution
scf_plot_hist(scf2022, ~income, bins = 40, xlim = c(0, 300000))

# Bar chart of mean net worth by education level
scf_plot_cbar(scf2022, ~networth, ~edcl, stat = "mean")

# Hexbin plot: net worth vs. income
scf_plot_hex(scf2022, ~income, ~networth, bins = 60)

Wrangling and Transformation

# Create new variables across all implicates
scf2022 <- scf_update(scf2022,
  rich = networth > 1e6,
  senior = age >= 65,
  log_income = log(income + 1)
)

# Subset to working-age households with positive net worth
scf_sub <- scf_subset(scf2022, age >= 25 & age < 65 & networth > 0)

# Extract implicate-level estimates from a frequency table
freq <- scf_freq(scf_sub, ~own)
scf_implicates(freq, long = TRUE)

Documentation:

For detailed examples, function documentation, and usage guides, consult the package vignettes and reference manual.

Note on Mock Data

This package includes a small mock dataset (mock_scf2022.rds) for testing purposes.
It includes only 75 rows and select variables. It is structurally valid,
but not suitable for analytical use or inference. Mock data objects carry a "mock" = TRUE attribute and may trigger warnings in functions to discourage interpretive use.

Citation

If you use scf in published work, please cite it as:

Joseph N. Cohen (2025). scf: Tools for Analyzing the Survey of Consumer Finances. R package. ver. 1.0.5. https://github.com/jncohen/scf

Use citation("scf") in R for formatted references.

Author

Joseph N. Cohen
Department of Sociology & Program in Data Analytics
Queens College, City University of New York
joseph.cohen@qc.cuny.edu https://jncohen.commons.gc.cuny.edu

Copy Link

Version

Install

install.packages('scf')

Monthly Downloads

191

Version

1.0.5

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Joseph Cohen

Last Published

November 20th, 2025

Functions in scf (1.0.5)

scf_freq

Estimate the Frequencies of a Discrete Variable from SCF Microdata
scf_load

Load SCF Data as Multiply-Imputed Survey Designs
scf_median

Estimate the Population Median of a Continuous SCF Variable
scf_implicates

Extract Implicate-Level Estimates from SCF Results
scf_download

Download and Prepare SCF Microdata for Local Analysis
scf_logit

Estimate Logistic Regression Model using SCF Microdata
scf_glm

Estimate Generalized Linear Model from SCF Microdata
scf_percentile

Estimate Percentiles in SCF Microdata
scf_plot_bbar

Stacked Bar Chart of Two Discrete Variables in SCF Data
scf_ols

Estimate an Ordinary Least Squares Regression on SCF Microdata
scf_prop_test

Test a Proportion in SCF Data
scf_plot_hex

Hexbin Plot of Two Continuous SCF Variables
scf_plot_smooth

Smoothed Distribution Plot of a Continuous Variable in SCF Data
scf_plot_dist

Plot a Univariate Distribution of an SCF Variable
scf_plot_cbar

Bar Plot of Summary Statistics by Grouping Variable in SCF Data
scf_plot_hist

Histogram of a Continuous Variable in Multiply-Imputed SCF Data
scf_plot_dbar

Plot Bar Chart of a Discrete Variable from SCF Data
scf_subset

Subset an scf_mi_survey Object
scf_update

Create or Alter SCF Variables
scf_update_by_implicate

Modify Each Implicate Individually in SCF Data
scf_regtable

Format and Display Regression Results from Multiply-Imputed SCF Models
scf_theme

Default Plot Theme for SCF Visualizations
scf_xtab

Cross-Tabulate Two Discrete Variables in Multiply-Imputed SCF Data
scf_ttest

T-Test of Means using SCF Microdata
vcov.scf_model_result

Generic S3 Method: vcov.scf_model_result
predict.scf_model_result

Generic S3 Method: predict.scf_model_result
residuals.scf_model_result

Generic S3 Method: residuals.scf_model_result
formula.scf_model_result

Generic S3 Method: formula.scf_model_result
scf

Analyzing Survey of Consumer Finances Public-Use Microdata
coef.scf_model_result

Generic S3 Method: coef.scf_model_result
AIC.scf_model_result

Generic S3 Method: AIC.scf_model_result
scf_MIcombine

Combine Estimates Across SCF Implicates Using Rubin's Rules
SE

Extract Standard Errors from Pooled SCF Model Results
scf_activate_theme

Activate SCF Plot Theme
scf_corr

Estimate Correlation Between Two Continuous Variables in SCF Microdata
scf_imports

Internal Import Declarations
scf_design

Construct SCF Core Data Object
scf_mean

Estimate Mean in Multiply-Imputed SCF Data