Learn R Programming

surveycore

surveycore is the foundation of the surveyverse ecosystem — a modern, tidyverse-compatible replacement for the survey and srvyr packages in R.

It provides S7-based survey design objects with:

  • A tidy-select interface (ids = c(psu, ssu), no formula syntax)
  • Automatic preservation of haven-style variable labels and value labels
  • Exact variance estimation (Taylor linearization, replicate weights, two-phase designs)
  • Seamless conversion to and from survey::svydesign and srvyr::tbl_svy

For a side-by-side comparison with survey and srvyr, see vignette("surveycore-vs-survey").

Installation

# From CRAN:
install.packages("surveycore")

# Development version from GitHub:
# install.packages("pak")
pak::pak("JDenn0514/surveycore")

What surveycore provides

  • S7 survey objects: survey_taylor, survey_replicate, survey_twophase, survey_nonprob
  • Constructors: as_survey(), as_survey_replicate(), as_survey_twophase(), as_survey_nonprob()
  • Metadata system: set_var_label(), set_val_labels(), extract_var_label(), extract_val_labels() — with automatic haven attribute import
  • Analysis functions: get_freqs(), get_means(), get_totals(), get_corr(), get_quantiles(), get_ratios(), get_diffs()
  • Regression: survey_glm() for survey-weighted GLMs with clean() for tidy coefficient tables
  • Design utilities: update_design(), as_svydesign(), from_svydesign(), as_tbl_svy(), from_tbl_svy()

Who is this for?

surveycore is intended for:

  • Survey researchers and methodologists who analyse complex probability samples and need design-consistent variance estimates (stratified, clustered, replicate-weight, and two-phase designs).
  • Social scientists, epidemiologists, and public health researchers working with population surveys such as NHANES, ACS, GSS, or custom organizational surveys.
  • R users who want a tidyverse-compatible interface for the survey analysis workflows currently served by survey and srvyr.

The software is designed to analyse rectangular survey microdata: one row per respondent, numeric or categorical outcome variables, and either explicit survey weights or a design specification (ids, strata, FPC). It supports:

  • Data frames, tibbles, and data.table objects as input.
  • Variables with haven-style variable labels and value labels (e.g. from .xpt or .sav files read with haven).
  • Grouped analyses (via surveytidy::group_by()).

Each analysis function accepts specific types of outcome variables:

Basic usage

library(surveycore)

# ── Simple SRS design ──────────────────────────────────────────────────────────
set.seed(42)
df <- data.frame(
  psu = rep(1:10, each = 10),
  strata = rep(c("A", "B"), each = 50),
  weight = runif(100, 0.5, 2),
  income = rnorm(100, 50000, 10000),
  age = sample(18:80, 100, replace = TRUE)
)

d <- as_survey(df, ids = psu, weights = weight, strata = strata, nest = TRUE)
d
#> 
#> ── Survey Design ───────────────────────────────────────────────────────────────
#> <survey_taylor> (Taylor series linearization)
#> Sample size: 100
#> 
#> # A tibble: 100 × 5
#>      psu strata weight income   age
#>    <int> <chr>   <dbl>  <dbl> <int>
#>  1     1 A       1.87  53219.    42
#>  2     1 A       1.91  42162.    33
#>  3     1 A       0.929 65757.    71
#>  4     1 A       1.75  56429.    41
#>  5     1 A       1.46  50898.    50
#>  6     1 A       1.28  52766.    78
#>  7     1 A       1.60  56793.    55
#>  8     1 A       0.702 50898.    60
#>  9     1 A       1.49  20069.    58
#> 10     1 A       1.56  52849.    39
#> # ℹ 90 more rows

# ── Weighted mean and total ────────────────────────────────────────────────────
get_means(d, income)
#> # A tibble: 1 × 4
#>     mean ci_low ci_high     n
#>    <dbl>  <dbl>   <dbl> <int>
#> 1 50206. 47921.  52490.   100
get_totals(d, income)
#> # A tibble: 1 × 4
#>      total   ci_low  ci_high     n
#>      <dbl>    <dbl>    <dbl> <int>
#> 1 6460063. 5906356. 7013770.   100

Complex survey designs

# ── Replicate weights (BRR) ───────────────────────────────────────────────────
df_rep <- data.frame(
  y = rnorm(20),
  wt = runif(20, 1, 3),
  rep1 = runif(20, 0.5, 2),
  rep2 = runif(20, 0.5, 2),
  rep3 = runif(20, 0.5, 2),
  rep4 = runif(20, 0.5, 2)
)

d_rep <- as_survey_replicate(
  df_rep,
  weights = wt,
  repweights = starts_with("rep"),
  type = "BRR"
)
d_rep
#> 
#> ── Survey Design ───────────────────────────────────────────────────────────────
#> <survey_replicate> (BRR, 4 replicates)
#> Sample size: 20
#> 
#> # A tibble: 20 × 6
#>         y    wt  rep1  rep2  rep3  rep4
#>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 -2.00   2.30 1.09  0.849 0.705 1.71 
#>  2  0.334  2.84 0.619 1.37  0.766 1.90 
#>  3  1.17   1.73 1.74  1.76  1.28  1.75 
#>  4  2.06   2.71 0.609 0.698 1.72  0.691
#>  5 -1.38   1.60 0.672 1.84  0.673 1.47 
#>  6 -1.15   1.93 1.46  1.18  1.84  1.54 
#>  7 -0.706  1.29 0.981 1.84  1.36  0.548
#>  8 -1.05   2.62 0.783 0.873 0.720 1.88 
#>  9 -0.646  2.33 1.09  0.626 1.85  1.22 
#> 10 -0.185  1.12 1.79  0.573 0.880 0.900
#> # ℹ 10 more rows

Variable labels

surveycore preserves haven-style labels automatically when reading .xpt or .sav files. You can also set labels manually:

d2 <- set_var_label(d, income = "Annual household income (USD)")
d2 <- set_var_label(d2, age = "Respondent age in years")

extract_var_label(d2, income)
#>                          income 
#> "Annual household income (USD)"
extract_var_label(d2, age)
#>                       age 
#> "Respondent age in years"

Conversion to/from survey and srvyr

# To survey::svydesign
svy <- as_svydesign(d)
class(svy)
#> [1] "survey.design2" "survey.design"

# Back to surveycore
d_rt <- from_svydesign(svy)
d_rt

The surveyverse ecosystem

surveycore is the foundation of the surveyverse — a family of packages built around it:

  • surveytidy — dplyr verbs (filter(), select(), mutate(), group_by()) that respect survey design structure, so grouped summaries and subsetting always propagate weights and strata correctly.
  • surveywts — calibration and post-stratification for survey weights. Coming soon.

Development status

The package API is stable. The core classes, constructors, and analysis functions (get_freqs() through get_diffs()) are not expected to change in breaking ways. New analysis functions may be added in future releases. See NEWS.md for the full changelog.

Code of Conduct

Please note that the surveycore project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

GPL-3. Variance estimation code vendored from the survey package (Thomas Lumley, GPL-2/GPL-3) — see VENDORED.md for full attribution.

References

Lumley T (2004). “Analysis of Complex Survey Samples.” Journal of Statistical Software, 9(1), 1–19.

Lumley T (2010). Complex Surveys: A Guide to Analysis Using R. John Wiley and Sons.

Copy Link

Version

Install

install.packages('surveycore')

Version

0.8.3

License

GPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Jacob Dennen

Last Published

May 5th, 2026

Functions in surveycore (0.8.3)

extract_val_labels

Extract Value Labels
.get_design_vars_flat

Get design variable column names
clean

Tidy a Survey GLM Fit
extract_sata

Extract SATA (Select-All-That-Apply) Flags
extract_universe

Extract Universe Descriptions
extract_metadata

Extract All Metadata for Variables
extract_missing_codes

Extract Missing Value Codes
as_tbl_svy

Convert a surveycore Design Object to an srvyr tbl_svy
classify_question_type

Classify Variable Question Types
get_means

Weighted Mean for a Survey Design
get_covariance

Design-Based Population Covariance for a Survey Design
extract_var_label

Extract Variable Labels
get_freqs

Weighted Frequency Tables for Categorical Survey Variables
from_svydesign

Convert a survey Package Design to a surveycore Design Object
from_tbl_svy

Convert an srvyr tbl_svy to a surveycore Design Object
get_corr

Survey-Weighted Correlation (Pearson, Polychoric, Polyserial)
get_diffs

Treatment Effect Estimation for Survey Designs
extract_var_note

Extract Analyst Notes
get_anova

Design-Based Analysis of Variance for Survey GLM Fits
nhanes_2017

NHANES 2017-2018: Demographics and Blood Pressure
meta

Extract Metadata from a Survey Result
get_pairwise

All-Pairs Pairwise T-Tests for Survey Designs
get_quantiles

Survey-Weighted Quantiles
get_ratios

Survey-Weighted Ratio Estimation
infer_question_prefaces

Infer Question Prefaces from Variable Labels
get_t_test

Design-Based Two-Sample T-Test for Survey Designs
gss_2024

GSS 2024: General Social Survey
get_variance

Design-Based Population Variance for a Survey Design
get_totals

Weighted Total for a Survey Design
set_collection_if_missing_var

Set the Missing-Variable Behaviour on a survey_collection
set_collection_id

Set the Identifier Column on a survey_collection
print.survey_result

Print a Survey Result Object
set_question_preface

Set Question Preface(s)
remove_survey

Remove Surveys from a survey_collection
set_missing_codes

Set Missing Code(s)
pew_npors_2025

Pew NPORS 2025: National Public Opinion Reference Survey
print.survey_diffs

Print a Survey Diffs Result
ns_wave1

Nationscape Wave 1: July 18, 2019
pew_jewish_2020

Pew Jewish Americans 2020
set_universe

Set Universe Description(s)
survey_glm_fit

Survey-Weighted GLM Fit Object
survey_collection

Multi-Survey Container
survey_data

Access the Data Component of a Survey Design Object
survey_glm

Fit a Survey-Weighted Generalised Linear Model
set_val_labels

Set Value Labels
set_var_label

Set Variable Label(s)
survey_base

Abstract Base Survey Design Class
set_var_note

Set Analyst Note(s)
set_sata

Set SATA (Select-All-That-Apply) Flag
update_design

Update Design Variables on an Existing Survey Object
survey_replicate

Replicate Weights Survey Design
survey_nonprob

Calibrated / Non-Probability Survey Design
survey_twophase

Two-Phase Survey Design
survey_taylor

Taylor Series Linearization Survey Design
survey_metadata

Survey Metadata Container
survey_weighting_history

Extract the Weighting History from a Survey Object
as_survey_twophase

Create a Two-Phase Survey Design
as_survey

Create a Taylor Series Linearization Survey Design
as_survey_collection

Create a Collection of Survey Designs
SURVEYCORE_DOMAIN_COL

Internal Domain Column Name Constant
as_svydesign

Convert a surveycore Design Object to a survey Package Design
add_survey

Add Surveys to a survey_collection
as_survey_replicate

Create a Replicate Weights Survey Design
as_survey_nonprob

Create a Calibrated / Non-Probability Survey Design
acs_pums_wy

ACS PUMS 2022 1-Year: Wyoming Persons
anes_2024

ANES 2024: American National Election Studies Time Series
extract_question_preface

Extract Question Prefaces