datadr (version 0.8.4)

drBLB: Bag of Little Bootstraps Transformation Method

Description

Bag of little bootstraps transformation method

Usage

drBLB(x, statistic, metric, R, n)

Arguments

x
a subset of a ddf
statistic
a function to apply to the subset specifying the statistic to compute. Must have arguments 'data' and 'weights' - see details). Must return a vector, where each element is a statistic of interest.
metric
a function specifying the metric to be applied to the R bootstrap samples of each statistic returned by statistic. Expects an input vector and should output a vector.
R
the number of bootstrap samples
n
the total number of observations in the data

Details

It is necessary to specify weights as a parameter to the statistic function because for BLB to work efficiently, it must resample each time with a sample of size n. To make this computationally possible for very large n, we can use weights (see reference for details). Therefore, only methods with a weights option can legitimately be used here.

References

Kleiner, Ariel, et al. "A scalable bootstrap for massive data." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76.4 (2014): 795-816.

See Also

divide, recombine

Examples

Run this code
# BLB is meant to run on random replicate divisions
rrAdult <- divide(adult, by = rrDiv(1000), update = TRUE)

adultBlb <- rrAdult %>% addTransform(function(x) {
  drBLB(x,
    statistic = function(x, weights)
      coef(glm(incomebin ~ educationnum + hoursperweek + sex,
        data = x, weights = weights, family = binomial())),
    metric = function(x)
      quantile(x, c(0.05, 0.95)),
    R = 100,
    n = nrow(rrAdult)
  )
})

# compute the mean of the resulting CI limits
# (this will take a little bit of time because of resampling)
coefs <- recombine(adultBlb, combMean)
matrix(coefs, ncol = 2, byrow = TRUE)

Run the code above in your browser using DataLab