Creates a survey design object using replicate weights for variance estimation. Supports all common replicate methods: jackknife (JK1, JK2, JKn), balanced repeated replication (BRR, Fay), bootstrap, ACS, successive-difference, and user-defined types. Uses a tidy-select interface for weight and replicate-weight columns.
as_survey_replicate(
data,
weights,
repweights,
type = c("JK1", "JK2", "JKn", "BRR", "Fay", "bootstrap", "ACS",
"successive-difference", "other"),
scale = NULL,
rscales = NULL,
fpc = NULL,
fpctype = c("fraction", "correction"),
mse = TRUE
)A survey_replicate object.
A data.frame containing the survey responses. Must have at
least one row and unique column names.
<tidy-select> Sampling weight
column (a single column, values strictly > 0). Required.
<tidy-select> Replicate weight
columns. Must select at least one column. Supports tidy-select helpers
(e.g., starts_with("repwt")). Required.
Character. Replicate weight method. One of "JK1" (delete-1
jackknife), "JK2" (delete-1 jackknife, stratified), "JKn" (delete-1
jackknife with varying replication counts), "BRR" (balanced repeated
replication), "Fay" (Fay's method, a modified BRR), "bootstrap",
"ACS" (used in American Community Survey), "successive-difference",
or "other" (user-specified scale). Case-sensitive.
Numeric. Scaling factor applied to the replicate variance
formula. If NULL (default), computed automatically from type and
the number of replicates: (R-1)/R for jackknife methods, 1/4 for
BRR/Fay, 1/R for bootstrap/ACS, 2/R for successive-difference,
1 for other.
Numeric vector of replicate-specific scaling factors, or
NULL. If provided, must have the same length as the number of
replicate weight columns selected by repweights.
<tidy-select> Finite population
correction column (a single column). Used by some replicate methods to
adjust the variance estimator. NULL means no FPC correction.
Character. How fpc is interpreted: "fraction" (sampling
fraction, 0–1) or "correction" (multiplier for the replicate variance).
Default "fraction". Case-sensitive.
Logical. If TRUE (default), use mean-squared-error estimates
(subtract the full-sample estimate rather than the mean replicate estimate
when computing variance). Recommended for most designs.
Both weights and repweights support tidy-select syntax:
# Bare name for weights
as_survey_replicate(
df, weights = wt, repweights = starts_with("repwt"), type = "BRR"
)
# c() for explicit replicate columns
as_survey_replicate(
df, weights = wt, repweights = c(rep1, rep2, rep3), type = "JK1"
)
The replicate weight matrix is not stored in the object. Only the
column names are stored in @variables$repweights. Variance estimation
computes the matrix on demand:
as.matrix(design@data[, design@variables$repweights]).
Each call to an estimation function (e.g., get_means(), get_totals())
materialises the full replicate weight matrix from the data frame. For large
designs (e.g., ACS PUMS with 500k+ rows × 80 replicates), this is roughly
nrow * n_replicates * 8 bytes per call (~363 MB for ACS Wyoming × 80).
If you are estimating many variables, this is repeated for each call.
This behaviour matches the survey package reference implementation.
Judkins, D.R. (1990) Fay's method for variance estimation. Journal of the American Statistical Association 85(410), 895--904.
Canty, A.J. and Davison, A.C. (1999) Resampling-based variance estimation for labour force surveys. The Statistician 48(3), 379--391.
Shao, J. and Tu, D. (1995) The Jackknife and Bootstrap. Springer.
as_survey() for Taylor series designs,
as_survey_twophase() for two-phase designs,
set_var_label() to add variable labels
Other constructors:
as_survey(),
as_survey_nonprob(),
as_survey_twophase(),
survey_data(),
survey_glm(),
survey_glm_fit(),
survey_nonprob(),
survey_replicate(),
survey_taylor(),
survey_twophase()
# ACS PUMS Wyoming: 80 successive-difference replicate weights
d_acs <- as_survey_replicate(
acs_pums_wy,
weights = pwgtp,
repweights = pwgtp1:pwgtp80,
type = "successive-difference"
)
# Explicit replicate columns using c()
d_sub <- as_survey_replicate(
acs_pums_wy,
weights = pwgtp,
repweights = c(pwgtp1, pwgtp2, pwgtp3, pwgtp4),
type = "JK1"
)
Run the code above in your browser using DataLab