make_doubled_half_bootstrap_weights: Weights for "Doubled Half Bootstrap" of Antal and Tillé (2014)

Description

Creates bootstrap replicate weights using the method of Antal and Tillé (2014). This method is applicable to single-stage sample designs, potentially with stratification and clustering. It can be used for designs that use simple random sampling without replacement or unequal probability sampling without replacement. One advantage of this method is that it yields integer replicate factors of 0, 1, 2, or 3.

Usage

make_doubled_half_bootstrap_weights(
  num_replicates = 100,
  samp_unit_ids,
  strata_ids,
  samp_unit_sel_probs,
  output = "weights"
)

Value

A matrix of with the same number of rows as samp_unit_ids

and the number of columns equal to the value of the argument num_replicates. Specifying output = "factors" returns a matrix of replicate adjustment factors which can later be multiplied by the full-sample weights to produce a matrix of replicate weights. Specifying output = "weights" returns the matrix of replicate weights, where the full-sample weights are inferred using samp_unit_sel_probs.

Arguments

num_replicates: Positive integer giving the number of bootstrap replicates to create.
samp_unit_ids: Vector of sampling unit IDs.
strata_ids: Vector of strata IDs for each sampling unit at each stage of sampling.
samp_unit_sel_probs: Vector of selection probabilities for each sampling unit.
output: Either "weights" (the default) or "factors". Specifying output = "factors" returns a matrix of replicate adjustment factors which can later be multiplied by the full-sample weights to produce a matrix of replicate weights. Specifying output = "weights" returns the matrix of replicate weights, where the full-sample weights are inferred using samp_unit_sel_probs.

Details

For stratified sampling, the replicate factors are generated independently in each stratum. For cluster sampling at a given stage, the replicate factors are generated at the cluster level and then the cluster's replicate factors are applied to all units in the cluster.

In the case of unequal probability sampling, this bootstrap method is only recommended for high entropy sampling methods (i.e., most methods other than systematic sampling).

See Section 7 of Antal and Tillé (2014) for a clear description of how the replicates are formed. The paper presents two options for the resampling probabilities used in replication: the R function uses the option referred to in the paper as "the \(\pi\)-bootstrap."

References

Antal, E. and Tillé, Y. (2014). "A new resampling method for sampling designs without replacement: The doubled half bootstrap." Computational Statistics, 29(5), 1345-1363. https://doi.org/10.1007/s00180-014-0495-0

Examples

Run this code

# \donttest{
 
 # Example 1: A cluster sample
 
   data('library_multistage_sample', package = 'svrep')
  
   replicate_factors <- make_doubled_half_bootstrap_weights(
     num_replicates      = 5,
     samp_unit_ids       = library_multistage_sample$PSU_ID,
     strata_ids          = rep(1, times = nrow(library_multistage_sample)),
     samp_unit_sel_probs = library_multistage_sample$PSU_SAMPLING_PROB,
     output              = "factors"
   )

 # Example 2: A single-stage sample selected with unequal probabilities, without replacement

   ## Load an example dataset of U.S. counties states with 2004 Presidential vote counts
   data("election", package = 'survey')
   pps_wor_design <- svydesign(data = election_pps,
                               pps = "overton",
                               fpc = ~ p, # Inclusion probabilities
                               ids = ~ 1)

   ## Create bootstrap replicate weights
   set.seed(2022)
   bootstrap_replicate_weights <- make_doubled_half_bootstrap_weights(
     num_replicates      = 5000,
     samp_unit_ids       = pps_wor_design$cluster[,1],
     strata_ids          = pps_wor_design$strata[,1],
     samp_unit_sel_probs = pps_wor_design$prob
   )

   ## Create a replicate design object with the survey package
   bootstrap_rep_design <- svrepdesign(
     data       = pps_wor_design$variables,
     repweights = bootstrap_replicate_weights,
     weights    = weights(pps_wor_design, type = "sampling"),
     type       = "bootstrap"
   )

   ## Compare std. error estimates from bootstrap versus linearization
   data.frame(
     'Statistic' = c('total', 'mean'),
     'SE (bootstrap)' = c(SE(svytotal(x = ~ Bush, design = bootstrap_rep_design)),
                          SE(svymean(x = ~ I(Bush/votes),
                                     design = bootstrap_rep_design))),
     'SE (Overton\'s PPS approximation)' = c(SE(svytotal(x = ~ Bush,
                                                         design = pps_wor_design)),
                                             SE(svymean(x = ~ I(Bush/votes),
                                                        design = pps_wor_design))),
     check.names = FALSE
   )
# }

Run the code above in your browser using DataLab