Last chance! 50% off unlimited learning
Sale ends in
Calculate bootstrap confidence intervals using various methods.
int_pctl(.data, statistics, alpha = 0.05)int_t(.data, statistics, alpha = 0.05)
int_bca(.data, statistics, alpha = 0.05, .fn, ...)
A data frame containing the bootstrap resamples created using
bootstraps()
. For t- and BCa-intervals, the apparent
argument
should be set to TRUE
. Even if the apparent
argument is set to
TRUE
for the percentile method, the apparent data is never used in calculating
the percentile confidence interval.
An unquoted column name or dplyr
selector that identifies
a single column in the data set that contains the individual bootstrap
estimates. This can be a list column of tidy tibbles (that contains columns
term
and estimate
) or a simple numeric column. For t-intervals, a
standard tidy column (usually called std.err
) is required.
See the examples below.
Level of significance
A function to calculate statistic of interest. The
function should take an rsplit
as the first argument and the ...
are
required.
Arguments to pass to .fn
.
Each function returns a tibble with columns .lower
,
.estimate
, .upper
, .alpha
, .method
, and term
.
.method
is the type of interval (eg. "percentile",
"student-t", or "BCa"). term
is the name of the estimate. Note
the .estimate
returned from int_pctl()
is the mean of the estimates from the bootstrap resamples
and not the estimate from the apparent model.
Percentile intervals are the standard method of obtaining confidence intervals but require thousands of resamples to be accurate. T-intervals may need fewer resamples but require a corresponding variance estimate. Bias-corrected and accelerated intervals require the original function that was used to create the statistics of interest and are computationally taxing.
Davison, A., & Hinkley, D. (1997). Bootstrap Methods and their Application. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511802843
https://rsample.tidymodels.org/articles/Applications/Intervals.html
# NOT RUN {
library(broom)
library(dplyr)
library(purrr)
library(tibble)
lm_est <- function(split, ...) {
lm(mpg ~ disp + hp, data = analysis(split)) %>%
tidy()
}
set.seed(52156)
car_rs <-
bootstraps(mtcars, 500, apparent = TRUE) %>%
mutate(results = map(splits, lm_est))
int_pctl(car_rs, results)
int_t(car_rs, results)
int_bca(car_rs, results, .fn = lm_est)
# putting results into a tidy format
rank_corr <- function(split) {
dat <- analysis(split)
tibble(
term = "corr",
estimate = cor(dat$sqft, dat$price, method = "spearman"),
# don't know the analytical std.err so no t-intervals
std.err = NA_real_
)
}
set.seed(69325)
data(Sacramento, package = "modeldata")
bootstraps(Sacramento, 1000, apparent = TRUE) %>%
mutate(correlations = map(splits, rank_corr)) %>%
int_pctl(correlations)
# }
Run the code above in your browser using DataLab