boot_ci: Standard error and confidence intervals for bootstrapped estimates

Description

Compute nonparametric bootstrap estimate, standard error, confidence intervals and p-value for a vector of bootstrap replicate estimates.

Usage

boot_ci(data, ..., method = c("dist", "quantile"), ci.lvl = 0.95)
boot_se(data, ...)
boot_p(data, ...)
boot_est(data, ...)

Arguments

data

A data frame that containts the vector with bootstrapped estimates, or directly the vector (see 'Examples').

...

Optional, unquoted names of variables with bootstrapped estimates. Required, if either data is a data frame (and no vector), and only selected variables from data should be processed. You may also use functions like : or tidyselect's select_helpers.

method

Character vector, indicating if confidence intervals should be based on bootstrap standard error, multiplied by the value of the quantile function of the t-distribution (default), or on sample quantiles of the bootstrapped values. See 'Details' in boot_ci(). May be abbreviated.

ci.lvl

Numeric, the level of the confidence intervals.

Value

A tibble with either bootstrap estimate, standard error, the lower and upper confidence intervals or the p-value for all bootstrapped estimates.

Details

The methods require one or more vectors of bootstrap replicate estimates as input.

boot_est() returns the bootstrapped estimate, simply by computing the mean value of all bootstrap estimates.
boot_se() computes the nonparametric bootstrap standard error by calculating the standard deviation of the input vector.
The mean value of the input vector and its standard error is used by boot_ci() to calculate the lower and upper confidence interval, assuming a t-distribution of bootstrap estimate replicates (for method = "dist", the default, which is mean(x) +/- qt(.975, df = length(x) - 1) * sd(x)); for method = "quantile", 95% sample quantiles are used to compute the confidence intervals (quantile(x, probs = c(.025, .975))). Use ci.lvl to change the level for the confidence interval.
P-values from boot_p() are also based on t-statistics, assuming normal distribution.

References

Carpenter J, Bithell J. Bootstrap confdence intervals: when, which, what? A practical guide for medical statisticians. Statist. Med. 2000; 19:1141-1164

Examples

Run this code

# NOT RUN {
library(dplyr)
library(purrr)
data(efc)
bs <- bootstrap(efc, 100)

# now run models for each bootstrapped sample
bs$models <- map(bs$strap, ~lm(neg_c_7 ~ e42dep + c161sex, data = .x))

# extract coefficient "dependency" and "gender" from each model
bs$dependency <- map_dbl(bs$models, ~coef(.x)[2])
bs$gender <- map_dbl(bs$models, ~coef(.x)[3])

# get bootstrapped confidence intervals
boot_ci(bs$dependency)

# compare with model fit
fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc)
confint(fit)[2, ]

# alternative function calls.
boot_ci(bs$dependency)
boot_ci(bs, dependency)
boot_ci(bs, dependency, gender)
boot_ci(bs, dependency, gender, method = "q")


# compare coefficients
mean(bs$dependency)
boot_est(bs$dependency)
coef(fit)[2]


# bootstrap() and boot_ci() work fine within pipe-chains
efc %>%
  bootstrap(100) %>%
  mutate(
    models = map(strap, ~lm(neg_c_7 ~ e42dep + c161sex, data = .x)),
    dependency = map_dbl(models, ~coef(.x)[2])
  ) %>%
  boot_ci(dependency)

# check p-value
boot_p(bs$gender)
summary(fit)$coefficients[3, ]

# }
# NOT RUN {
# 'spread_coef()' from the 'sjmisc'-package makes it easy to generate
# bootstrapped statistics like confidence intervals or p-values
library(dplyr)
library(sjmisc)
efc %>%
  # generate bootstrap replicates
  bootstrap(100) %>%
  # apply lm to all bootstrapped data sets
  mutate(
    models = map(strap, ~lm(neg_c_7 ~ e42dep + c161sex + c172code, data = .x))
  ) %>%
  # spread model coefficient for all 100 models
  spread_coef(models) %>%
  # compute the CI for all bootstrapped model coefficients
  boot_ci(e42dep, c161sex, c172code)

# or...
efc %>%
  # generate bootstrap replicates
  bootstrap(100) %>%
  # apply lm to all bootstrapped data sets
  mutate(
    models = map(strap, ~lm(neg_c_7 ~ e42dep + c161sex + c172code, data = .x))
  ) %>%
  # spread model coefficient for all 100 models
  spread_coef(models, append = FALSE) %>%
  # compute the CI for all bootstrapped model coefficients
  boot_ci()
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab