get_design_quad_form: Determine the quadratic form matrix of a variance estimator for a survey design object

Description

Determines the quadratic form matrix of a specified variance estimator, by parsing the information stored in a survey design object created using the 'survey' package.

Usage

get_design_quad_form(
  design,
  variance_estimator,
  ensure_psd = FALSE,
  aux_var_names = NULL
)

Value

A matrix representing the quadratic form of a specified variance estimator, based on extracting information about clustering, stratification, and selection probabilities from the survey design object.

Arguments

design

A survey design object created using the 'survey' (or 'srvyr') package, with class 'survey.design' or 'svyimputationList'. Also accepts two-phase design objects with class 'twophase2'; see the section below titled "Two-Phase Designs" for more information about handling of two-phase designs.

variance_estimator

The name of the variance estimator whose quadratic form matrix should be created.
See the section "Variance Estimators" below. Options include:

"Yates-Grundy":
The Yates-Grundy variance estimator based on first-order and second-order inclusion probabilities.
"Horvitz-Thompson":
The Horvitz-Thompson variance estimator based on first-order and second-order inclusion probabilities.
"Poisson Horvitz-Thompson":
The Horvitz-Thompson variance estimator based on assuming Poisson sampling, with first-order inclusion probabilities inferred from the sampling probabilities of the survey design object.
"Stratified Multistage SRS":
The usual stratified multistage variance estimator based on estimating the variance of cluster totals within strata at each stage.
"Ultimate Cluster":
The usual variance estimator based on estimating the variance of first-stage cluster totals within first-stage strata.
"Deville-1":
A variance estimator for unequal-probability sampling without replacement, described in Matei and Tillé (2005) as "Deville 1".
"Deville-2":
A variance estimator for unequal-probability sampling without replacement, described in Matei and Tillé (2005) as "Deville 2".
"Deville-Tille":
A variance estimator useful for balanced sampling designs, proposed by Deville and Tillé (2005).
"SD1":
The non-circular successive-differences variance estimator described by Ash (2014), sometimes used for variance estimation for systematic sampling.
"SD2":
The circular successive-differences variance estimator described by Ash (2014). This estimator is the basis of the "successive-differences replication" estimator commonly used for variance estimation for systematic sampling.
"Beaumont-Emond":
The variance estimator of Beaumont and Emond (2022) for multistage unequal-probability sampling without replacement.
"BOSB":
The kernel-based variance estimator proposed by Breidt, Opsomer, and Sanchez-Borrego (2016) for use with systematic samples or other finely stratified designs. Uses the Epanechnikov kernel with the bandwidth automatically chosen to result in the smallest possible nonempty kernel window.

ensure_psd

If TRUE (the default), ensures that the result is a positive semidefinite matrix. This is necessary if the quadratic form is used as an input for replication methods such as the generalized bootstrap. For mathematical details, please see the documentation for the function get_nearest_psd_matrix(). The approximation method is discussed by Beaumont and Patak (2012) in the context of forming replicate weights for two-phase samples. The authors argue that this approximation should lead to only a small overestimation of variance.

aux_var_names

Only required if variance_estimator = "Deville-Tille" or if variance_estimator = "BOSB". For the Deville-Tillé estimator, this should be a character vector of variable names for auxiliary variables to be used in the variance estimator. For the BOSB estimator, this should be a string giving a single variable name to use as an auxiliary variable in the kernel-based variance estimator of Breidt, Opsomer, and Sanchez-Borrego (2016).

Variance Estimators

See variance-estimators for a description of each variance estimator.

Two-Phase Designs

For a two-phase design, variance_estimator should be a list of variance estimators' names, with two elements, such as list('Ultimate Cluster', 'Poisson Horvitz-Thompson'). In two-phase designs, only the following estimators may be used for the second phase:

"Ultimate Cluster"
"Stratified Multistage SRS"
"Poisson Horvitz-Thompson"

For statistical details on the handling of two-phase designs, see the documentation for make_twophase_quad_form.

References

- Ash, S. (2014). "Using successive difference replication for estimating variances." Survey Methodology, Statistics Canada, 40(1), 47–59.

- Beaumont, Jean-François, and Zdenek Patak. (2012). "On the Generalized Bootstrap for Sample Surveys with Special Attention to Poisson Sampling: Generalized Bootstrap for Sample Surveys." International Statistical Review 80 (1): 127–48.

- Bellhouse, D.R. (1985). "Computing Methods for Variance Estimation in Complex Surveys." Journal of Official Statistics, Vol.1, No.3.

- Breidt, F. J., Opsomer, J. D., & Sanchez-Borrego, I. (2016). "Nonparametric Variance Estimation Under Fine Stratification: An Alternative to Collapsed Strata." Journal of the American Statistical Association, 111(514), 822–833. https://doi.org/10.1080/01621459.2015.1058264

- Deville, J.‐C., and Tillé, Y. (2005). "Variance approximation under balanced sampling." Journal of Statistical Planning and Inference, 128, 569–591.

- Särndal, C.-E., Swensson, B., & Wretman, J. (1992). "Model Assisted Survey Sampling." Springer New York.

Examples

Run this code

if (FALSE) {
# Example 1: Quadratic form for successive-difference variance estimator ----

   data('library_stsys_sample', package = 'svrep')

   ## First, ensure data are sorted in same order as was used in sampling
   library_stsys_sample <- library_stsys_sample[
     order(library_stsys_sample$SAMPLING_SORT_ORDER),
   ]

   ## Create a survey design object
   design_obj <- svydesign(
     data = library_stsys_sample,
     strata = ~ SAMPLING_STRATUM,
     ids = ~ 1,
     fpc = ~ STRATUM_POP_SIZE
   )

   ## Obtain quadratic form
   quad_form_matrix <- get_design_quad_form(
     design = design_obj,
     variance_estimator = "SD2"
   )

   ## Estimate variance of estimated population total
   y <- design_obj$variables$LIBRARIA
   wts <- weights(design_obj, type = 'sampling')
   y_wtd <- as.matrix(y) * wts
   y_wtd[is.na(y_wtd)] <- 0

   pop_total <- sum(y_wtd)

   var_est <- t(y_wtd) %*% quad_form_matrix %*% y_wtd
   std_error <- sqrt(var_est)

   print(pop_total); print(std_error)

   # Compare to estimate from assuming SRS
   svytotal(x = ~ LIBRARIA, na.rm = TRUE,
            design = design_obj)
            
# Example 2: Kernel-based variance estimator ----

   Q_BOSB <- get_design_quad_form(
     design             = design_obj,
     variance_estimator = "BOSB",
     aux_var_names      = "SAMPLING_SORT_ORDER"
   )
   
   var_est <- t(y_wtd) %*% Q_BOSB %*% y_wtd
   std_error <- sqrt(var_est)
   
   print(pop_total); print(std_error)

# Example 3: Two-phase design (second phase is nonresponse) ----

  ## Estimate response propensities, separately by stratum
  library_stsys_sample[['RESPONSE_PROB']] <- svyglm(
    design = design_obj,
    formula = I(RESPONSE_STATUS == "Survey Respondent") ~ SAMPLING_STRATUM,
    family = quasibinomial('logistic')
  ) |> predict(type = 'response')

  ## Create a survey design object,
  ## where nonresponse is treated as a second phase of sampling
  twophase_design <- twophase(
    data = library_stsys_sample,
    strata = list(~ SAMPLING_STRATUM, NULL),
    id = list(~ 1, ~ 1),
    fpc = list(~ STRATUM_POP_SIZE, NULL),
    probs = list(NULL, ~ RESPONSE_PROB),
    subset = ~ I(RESPONSE_STATUS == "Survey Respondent")
  )

  ## Obtain quadratic form for the two-phase variance estimator,
  ## where first phase variance contribution estimated
  ## using the successive differences estimator
  ## and second phase variance contribution estimated
  ## using the Horvitz-Thompson estimator
  ## (with joint probabilities based on assumption of Poisson sampling)
  get_design_quad_form(
    design = twophase_design,
    variance_estimator = list(
      "SD2",
      "Poisson Horvitz-Thompson"
    )
  )
}

Run the code above in your browser using DataLab