Learn R Programming

⚠️There's a newer version (1.0.5) of this package.Take me there.

SCF: An R Package for Analyzing the Survey of Consumer Finances

Overview

The scf R package provides a structured, reproducible, and pedagogically-aware toolkit for analyzing the U.S. Federal Reserve’s Survey of Consumer Finances (SCF), one of the highest-quality data sources for information on U.S. households’ balance sheets and income statements.

It wraps replicate-weighted, multiply-imputed SCF data into a consistent object class (scf_mi_survey) and offers end-to-end support for weighted descriptive statistics, hypothesis testing, regression modeling, and high-quality visualizations—while transparently incorporating Rubin’s Rules and complex sample design.

Table of Contents

Features

Data Preparation

  • scf_download(): Downloads and preprocesses SCF microdata, including all five implicates and 999 replicate weights.
  • scf_load(): Loads .rds files into structured scf_mi_survey objects ready for analysis.
  • scf_update(): Adds or transforms variables across implicates.
  • scf_subset(): Subsets the data consistently across all implicates.

Descriptive Statistics

  • scf_freq(): Weighted frequency tables for categorical variables.
  • scf_xtab(): Cross-tabulations by row, column, or cell percentages.
  • scf_mean(), scf_median(), scf_percentile(): Computes groupwise or overall statistics using Rubin’s Rules.
  • scf_corr(): Weighted Pearson correlations.

Statistical Inference

  • scf_ttest(): One-sample and two-sample t-tests for continuous variables.
  • scf_prop_test(): One-sample and two-sample proportion tests for binary variables.
  • scf_MIcombine(): Combines estimates across imputations using Rubin’s Rules (internal to most functions).

Regression Modeling

  • scf_ols(): Linear regression with pooled estimates and implicate diagnostics.
  • scf_glm(): Generalized linear models (e.g., logistic, Poisson).
  • scf_logit(): Wrapper for logistic regression with optional odds ratio output.

Visualization

  • scf_plot_dist(): Kernel density plots for visualizing and comparing distributions by group.
  • scf_plot_dbar(): Bar plots of categorical variable distributions.
  • scf_plot_bbar(): Stacked bar plots for two categorical variables.
  • scf_plot_cbar(): Bar plots for continuous variable summaries by group.
  • scf_plot_smooth(): Smoothed line plots for continuous distributions.
  • scf_plot_hist(): Weighted histograms of continuous variables.
  • scf_plot_hex(): Weighted hexbin plots for bivariate continuous data.

Diagnostics and Output

  • scf_implicates(): Extracts implicate-level results from SCF objects.
  • print(), summary(): Custom methods for clean, interpretable output in analysis and teaching.

Installation

The scf package is not yet on CRAN. To install the development version from GitHub:

# Install devtools if you don't already have it
install.packages("devtools")

# Install the SCF package from GitHub
devtools::install_github("jncohen/scf")

The package requires R ≥ 3.6 and the following packages:

  • survey (for replicate-weighted designs)
  • ggplot2 (for plotting)
  • httr, haven (for downloading and reading SCF data)
  • mitools, stats, utils, methods, and others (loaded automatically)

Use install.packages() to install any missing dependencies manually if needed.

Getting Started

Download and Load Data

# Download SCF data for 2022:
scf_download(2022)

# Load the data into a survey design object:
scf2022 <- scf_load(2022)
# Using mock data for CRAN compliance
scf2022 <- readRDS(system.file("extdata", "mock_scf2022.rds", package = "scf"))
# NOTE: This is mock data for demonstration only. 
# Use `scf_download()` and `scf_load()` for full SCF datasets.

Explore and Summarize

Univariate Distributions

# Frequency of education categories
scf_freq(scf2022, ~edcl)

# Median household net worth
scf_median(scf2022, ~networth)

# 90th percentile of income
scf_percentile(scf2022, ~income, q = 0.9)

# Histogram of net worth distribution
scf_plot_hist(scf2022, ~networth)

# Smoothed density plot of income
scf_plot_smooth(scf2022, ~income)

Bivariate Relationships

# Cross-tabulation of education and homeownership
scf_xtab(scf2022, ~edcl, ~own)

# Stacked bar chart: homeownership by education
scf_plot_bbar(scf2022, ~edcl, ~own)

# Weighted bar chart: mean net worth by education
scf_plot_cbar(scf2022, ~networth, ~edcl, stat = "mean")

# Grouped median income by race
scf_median(scf2022, ~income, by = ~racecl)

# Correlation between income and net worth
scf_corr(scf2022, ~income, ~networth)

# Hexbin plot: income vs. net worth
scf_plot_hex(scf2022, ~income, ~networth)

Statistical Testing

# One-sample proportion test: Is more than 10% of households rich?
scf_prop_test(scf2022, ~I(networth > 1e6), p = 0.10, alternative = "greater")

# Two-sample proportion test: Are women less likely to be rich?
scf_prop_test(scf2022, ~I(networth > 1e6), ~factor(hhsex, labels = c("Male", "Female")), alternative = "less")

# One-sample t-test: Is mean income different from $75,000?
scf_ttest(scf2022, ~income, mu = 75000)

# Two-sample t-test: Are older households wealthier?
scf_ttest(scf2022, ~networth, ~I(age > 50), alternative = "greater")

Regression Modeling

# Linear regression: Predict net worth from income and education
scf_ols(scf2022, networth ~ income + factor(edcl))

# Generalized linear model: Predict borrowing with logistic regression
scf_glm(scf2022, hborrff ~ income + age + factor(edcl), family = binomial())

# Logit wrapper: Predict probability of owning stocks
scf_logit(scf2022, ~I(owns_stocks == 1) ~ age + income + factor(edcl))

Plotting and Visualization


# Bar chart of a single categorical variable
scf_plot_dbar(scf2022, ~edcl)

# Stacked bar chart comparing education by race
scf_plot_bbar(scf2022, ~edcl, ~racecl, scale = "percent", percent_by = "row")

# Smoothed line plot of net worth distribution
scf_plot_smooth(scf2022, ~networth, xlim = c(0, 2e6), method = "loess")

# Histogram of income distribution
scf_plot_hist(scf2022, ~income, bins = 40, xlim = c(0, 300000))

# Bar chart of mean net worth by education level
scf_plot_cbar(scf2022, ~networth, ~edcl, stat = "mean")

# Hexbin plot: net worth vs. income
scf_plot_hex(scf2022, ~income, ~networth, bins = 60)

Wrangling and Transformation

# Create new variables across all implicates
scf2022 <- scf_update(scf2022,
  rich = networth > 1e6,
  senior = age >= 65,
  log_income = log(income + 1)
)

# Subset to working-age households with positive net worth
scf_sub <- scf_subset(scf2022, age >= 25 & age < 65 & networth > 0)

# Extract implicate-level estimates from a frequency table
freq <- scf_freq(scf_sub, ~own)
scf_implicates(freq, long = TRUE)

Documentation:

For detailed examples, function documentation, and usage guides, consult the package vignettes and reference manual.

Note on Mock Data

This package includes a small mock dataset (mock_scf2022.rds) for testing purposes.
It includes only 75 rows and select variables. It is structurally valid,
but not suitable for analytical use or inference.

Citation

If you use scf in published work, please cite it as:

Joseph N. Cohen (2025). scf: Tools for Analyzing the Survey of Consumer Finances. R package. ver. 1.0.3. https://github.com/jncohen/scf

Use citation("scf") in R for formatted references.

Author

Joseph N. Cohen
Department of Sociology & Program in Data Analytics
Queens College, City University of New York
joseph.cohen@qc.cuny.edu https://jncohen.commons.gc.cuny.edu

Copy Link

Version

Install

install.packages('scf')

Monthly Downloads

447

Version

1.0.4

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Joseph Cohen

Last Published

October 22nd, 2025

Functions in scf (1.0.4)

scf_mean

Estimate Mean in Multiply-Imputed SCF Data
scf_percentile

Estimate Percentile in a Continuous Variable in SCF Microdata
scf_plot_cbar

Bar Plot of Summary Statistics by Grouping Variable in SCF Data
scf_ols

Estimate an Ordinary Least Squares Regression on SCF Microdata
scf_median

Estimate the Population Median of a Continuous SCF Variable
scf_plot_dbar

Plot Bar Chart of a Discrete Variable from SCF Data
scf_prop_test

Test a Proportion in SCF Data
scf_plot_hist

Histogram of a Continuous Variable in Multiply-Imputed SCF Data
scf_plot_hex

Hexbin Plot of Two Continuous SCF Variables
scf_regtable

Format and Display Regression Results from Multiply-Imputed SCF Models
scf_plot_dist

Plot a Univariate Distribution of an SCF Variable
scf_ttest

T-Test of Means using SCF Microdata
scf_plot_smooth

Smoothed Distribution Plot of a Continuous Variable in SCF Data
scf_update

Create or Alter SCF Variables
scf_subset

Subset an scf_mi_survey Object
scf_theme

Default Plot Theme for SCF Visualizations
scf_xtab

Cross-Tabulate Two Discrete Variables in Multiply-Imputed SCF Data
scf_update_by_implicate

Modify Each Implicate Individually in SCF Data
SE

Extract Standard Errors from MIresult Object
scf_glm

Estimate Generalized Linear Model from SCF Microdata
scf_corr

Estimate Correlation Between Two Continuous Variables in SCF Microdata
scf

Analyzing Survey of Consumer Finances Public-Use Microdata
scf_activate_theme

Activate SCF Plot Theme
scf_MIcombine

Combine Estimates Across SCF Implicates Using Rubin's Rules
scf_implicates

Extract Implicate-Level Estimates from SCF Results
scf_download

Download and Prepare SCF Microdata for Local Analysis
scf_freq

Tabulate a Discrete Variable from SCF Microdata
scf_design

Construct SCF Core Data Object
scf_logit

Estimate Logistic Regression Model using SCF Microdata
scf_imports

Internal Import Declarations
scf_load

Load SCF Data as Multiply-Imputed Survey Designs
scf_plot_bbar

Stacked Bar Chart of Two Discrete Variables in SCF Data