
This help page describes variance estimators
which are commonly used for survey samples. These variance estimators
can be used as the basis of the generalized replication methods, implemented
with the functions as_fays_gen_rep_design()
,
as_gen_boot_design()
,
make_fays_gen_rep_factors()
,
or make_gen_boot_factors()
Let
The population total for a variable is denoted
The true sampling variance of
The Horvitz-Thompson variance estimator:
The Yates-Grundy variance estimator:
The Poisson Horvitz-Thompson variance estimator
is simply the Horvitz-Thompson variance estimator, but
where
The Stratified Multistage SRS variance estimator is the recursive variance estimator proposed by Bellhouse (1985) and used in the 'survey' package's function svyrecvar. In the case of simple random sampling without replacement (with one or more stages), this estimator exactly matches the Horvitz-Thompson estimator.
The estimator can be used for any number of sampling stages. For illustration, we describe its use
for two sampling stages.
The Ultimate Cluster variance estimator is simply the stratified multistage SRS
variance estimator, but ignoring variances from later stages of sampling.
option(survey.ultimate.cluster = TRUE)
or uses svyrecvar(..., one.stage = TRUE)
.
When the first-stage sampling fractions are small, analysts often omit the finite population corrections
The SD1 and SD2 variance estimators are "successive difference"
estimators sometimes used for systematic sampling designs.
Ash (2014) describes each estimator as follows:
For multistage samples, SD1 and SD2 are applied to the clusters at each stage, separately by stratum.
For later stages of sampling, the variance estimate from a stratum is multiplied by the product
of sampling fractions from earlier stages of sampling. For example, at a third stage of sampling,
the variance estimate from a third-stage stratum is multiplied by
The "Beaumont-Emond" variance estimator was proposed by Beaumont and Emond (2022),
intended for designs that use fixed-size, unequal-probability random sampling without replacement.
The variance estimator is simply the Horvitz-Thompson
variance estimator with the following approximation for the joint inclusion
probabilities.
For multistage samples, this approximation is applied to the clusters at each stage, separately by stratum.
For later stages of sampling, the variance estimate from a stratum is multiplied by the product
of sampling probabilities from earlier stages of sampling. For example, at a third stage of sampling,
the variance estimate from a third-stage stratum is multiplied by
The "Deville-1" and "Deville-2" variance estimators are clearly described in Matei and Tillé (2005), and are intended for designs that use fixed-size, unequal-probability random sampling without replacement. These variance estimators have been shown to be effective for designs that use a fixed sample size with a high-entropy sampling method. This includes most PPSWOR sampling methods, but unequal-probability systematic sampling is an important exception.
These variance estimators take the following form:
"Deville-1":
"Deville-2":
In the case of simple random sampling without replacement (SRSWOR), these estimators are both identical to the usual stratified multistage SRS estimator (which is itself a special case of the Horvitz-Thompson estimator).
For multistage samples, "Deville-1" and "Deville-2" are applied to the clusters at each stage, separately by stratum.
For later stages of sampling, the variance estimate from a stratum is multiplied by the product
of sampling probabilities from earlier stages of sampling. For example, at a third stage of sampling,
the variance estimate from a third-stage stratum is multiplied by
This kernel-based variance estimator was proposed by Breidt, Opsomer, and Sanchez-Borrego (2016), for use with samples selected using systematic sampling or where only a single sampling unit is selected from each stratum (sometimes referred to as "fine stratification").
Suppose there are
The variance estimator has the following form:
The terms
where
For most functions in the 'svrep' package, the kernel function
is the Epanechnikov kernel and the bandwidth is automatically selected
to yield the smallest possible nonempty kernel window, as was recommended
by Breidt, Opsomer, and Sanchez-Borrego (2016). That's the case for
the functions as_fays_gen_rep_design()
, as_gen_boot_design()
,
make_quad_form_matrix()
, etc. However, users can construct the quadratic
form matrix of this variance estimator using a different kernel and a different bandwidth
by directly working with the function make_kernel_var_matrix()
.
See Section 6.8 of Tillé (2020) for more detail on this estimator, including an explanation of its quadratic form. See Deville and Tillé (2005) for the results of a simulation study comparing this and other alternative estimators for balanced sampling.
The estimator can be written as follows:
Ash, S. (2014). "Using successive difference replication for estimating variances." Survey Methodology, Statistics Canada, 40(1), 47–59.
Beaumont, J.-F.; Émond, N. (2022). "A Bootstrap Variance Estimation Method for Multistage Sampling and Two-Phase Sampling When Poisson Sampling Is Used at the Second Phase." Stats, 5: 339–357. https://doi.org/10.3390/stats5020019
Bellhouse, D.R. (1985). "Computing Methods for Variance Estimation in Complex Surveys." Journal of Official Statistics, Vol.1, No.3.
Breidt, F. J., Opsomer, J. D., & Sanchez-Borrego, I. (2016). "Nonparametric Variance Estimation Under Fine Stratification: An Alternative to Collapsed Strata." Journal of the American Statistical Association, 111(514), 822–833. https://doi.org/10.1080/01621459.2015.1058264
Deville, J.‐C., and Tillé, Y. (2005). "Variance approximation under balanced sampling." Journal of Statistical Planning and Inference, 128, 569–591.
Tillé, Y. (2020). "Sampling and estimation from finite populations." (I. Hekimi, Trans.). Wiley.
Matei, Alina, and Yves Tillé. (2005). “Evaluation of Variance Approximations and Estimators in Maximum Entropy Sampling with Unequal Probability and Fixed Sample Size.” Journal of Official Statistics, 21(4):543–70.