Learn R Programming

simhelpers (version 0.3.1)

extrapolate_coverage: Extrapolate coverage and width using sub-sampled bootstrap confidence intervals.

Description

Given a set of bootstrap confidence intervals calculated across sub-samples with different numbers of replications, extrapolates confidence interval coverage and width of bootstrap confidence intervals to a specified (larger) number of bootstraps. The function also calculates the associated Monte Carlo standard errors. The confidence interval percentage is based on how you calculated the lower and upper bounds.

Usage

extrapolate_coverage(
  data,
  CI_subsamples,
  true_param,
  B_target = Inf,
  criteria = c("coverage", "width"),
  winz = Inf,
  nested = FALSE,
  format = "wide",
  width_trim = 0,
  cover_na_val = NA,
  width_na_val = NA
)

Value

A tibble containing the number of simulation iterations, performance criteria estimate(s) and the associated MCSE.

Arguments

data

data frame or tibble containing the simulation results.

CI_subsamples

list or name of column from data containing list of confidence intervals calculated based on sub-samples with different numbers of replications.

true_param

vector or name of column from data containing corresponding true parameters.

B_target

number of bootstrap replications to which the criteria should be extrapolated, with a default of B = Inf.

criteria

character or character vector indicating the performance criteria to be calculated, with possible options "coverage" and "width".

winz

numeric value for winsorization constant. If set to a finite value, estimates will be winsorized at the constant multiple of the inter-quartile range below the 25th percentile or above the 75th percentile of the distribution. For instance, setting winz = 3 will truncate estimates that fall below P25 - 3 * IQR or above P75 + 3 * IQR.

nested

logical value controlling the format of the output. If FALSE (the default), then the results will be returned as a data frame with rows for each distinct number of bootstraps. If TRUE, then the results will be returned as a data frame with a single row, with each performance criterion containing a nested data frame.

format

character string controlling the format of the output when CI_subsamples has results for more than one type of confidence interval. If "wide" (the default), then each performance criterion will have a separate column for each CI type. If "long", then each performance criterion will be a single variable, with separate rows for each CI type.

width_trim

numeric value specifying the trimming percentage to use when summarizing CI widths across replications from a single set of bootstraps, with a default of 0.0 (i.e., use the regular arithmetic mean).

cover_na_val

numeric value to use for calculating coverage if bootstrap CI end-points are missing. Default is NA.

width_na_val

numeric value to use for calculating width if bootstrap CI end-points are missing. Default is NA.

References

boos2000MonteCarloEvaluationsimhelpers

Examples

Run this code

dgp <- function(N, mu, nu) {
  mu + rt(N, df = nu)
}

estimator <- function(
   dat,
    B_vals = c(49,59,89,99),
    m = 4,
    trim = 0.1
) {


  # compute estimate and standard error
  N <- length(dat)
  est <- mean(dat, trim = trim)
  se <- sd(dat) / sqrt(N)

  # compute booties
  booties <- replicate(max(B_vals), {
    x <- sample(dat, size = N, replace = TRUE)
    data.frame(
      M = mean(x, trim = trim),
      SE = sd(x) / sqrt(N)
    )
  }, simplify = FALSE) |>
    dplyr::bind_rows()

  # confidence intervals for each B_vals
  CIs <- bootstrap_CIs(
    boot_est = booties$M,
    boot_se = booties$SE,
    est = est,
    se = se,
    CI_type = c("normal","basic","student","percentile"),
    B_vals = B_vals,
    reps = m,
    format = "wide-list"
  )

  res <- data.frame(
    est = est,
    se = se
  )
  res$CIs <- CIs

  res
}

#' build a simulation driver function
simulate_bootCIs <- bundle_sim(
  f_generate = dgp,
  f_analyze = estimator
)

boot_results <- simulate_bootCIs(
  reps = 50, N = 20, mu = 2, nu = 3,
  B_vals = seq(49, 199, 50),
)

extrapolate_coverage(
  data = boot_results,
  CI_subsamples = CIs,
  true_param = 2
)

extrapolate_coverage(
  data = boot_results,
  CI_subsamples = CIs,
  true_param = 2,
  B_target = 999,
  format = "long"
)

Run the code above in your browser using DataLab