# boot

##### Bootstrap Resampling

Generate `R`

bootstrap replicates of a statistic applied to data. Both
parametric and nonparametric resampling are possible. For the nonparametric
bootstrap, possible resampling methods are the ordinary bootstrap, the
balanced bootstrap, antithetic resampling, and permutation.
For nonparametric multi-sample problems stratified resampling is used.
This is specified by including a vector of strata in the call to boot.
Importance resampling weights may be specified.

- Keywords
- htest, nonparametric

##### Usage

```
boot(data, statistic, R, sim="ordinary", stype="i",
strata=rep(1,n), L=NULL, m=0, weights=NULL,
ran.gen=function(d, p) d, mle=NULL, ...)
```

##### Arguments

- data
- The data as a vector, matrix or dataframe. If it is a matrix or dataframe then each row is considered as one multivariate observation.
- statistic
- A function which when applied to data returns a vector containing the
statistic(s) of interest. When
`sim="parametric"`

, the first argument to`statistic`

must be the data. For each replicate a simulated dataset returned by`r`

##### Details

The statistic to be bootstrapped can be as simple or complicated as desired
as long as its arguments correspond to the dataset and (for a nonparametric
bootstrap) a vector of indices, frequencies or weights. `statistic`

is treated
as a black box by the `boot`

function and is not checked to ensure that these
conditions are met.

The first order balanced bootstrap is described in Davison, Hinkley and Schechtman (1986). The antithetic bootstrap is described by Hall (1989) and is experimental, particularly when used with strata. The other non-parametric simulation types are the ordinary bootstrap (possibly with unequal probabilities), and permutation which returns random permutations of cases. All of these methods work independently within strata if that argument is supplied.

For the parametric bootstrap it is necessary for the user to specify how the
resampling is to be conducted. The best way of accomplishing this is to
specify the function `ran.gen`

which will return a simulated data set from the
observed data set and a set of parameter estimates specified in `mle`

.

##### Value

- The returned value is an object of class
`"boot"`

, containing the following components : t0 The observed value of `statistic`

applied to`data`

.t A matrix with `R`

rows each of which is a bootstrap replicate of`statistic`

.R The value of `R`

as passed to`boot`

.data The `data`

as passed to`boot`

.seed The value of `.Random.seed`

when`boot`

was called.statistic The function `statistic`

as passed to`boot`

.sim Simulation type used. stype Statistic type as passed to `boot`

.call The original call to `boot`

.strata The strata used. This is the vector passed to `boot`

, if it was supplied or a vector of ones if there were no strata. It is not returned if`sim`

is`"parametric"`

.weights The importance sampling weights as passed to `boot`

or the empirical distribution function weights if no importance sampling weights were specified. It is omitted if`sim`

is not one of`"ordinary"`

or`"balanced"`

.pred.i If predictions are required ( `m>0`

) this is the matrix of indices at which predictions were calculated as they were passed to statistic. Omitted if`m`

is`0`

or`sim`

is not`"ordinary"`

.L The influence values used when `sim`

is`"antithetic"`

. If no such values were specified and`stype`

is not`"w"`

then`L`

is returned as consecutive integers corresponding to the assumption that data is ordered by influence values. This component is omitted when`sim`

is not`"antithetic"`

.ran.gen The random generator function used if `sim`

is`"parametric"`

. This component is omitted for any other value of`sim`

.mle The parameter estimates passed to `boot`

when`sim`

is`"parametric"`

. It is omitted for all other values of`sim`

.

##### item

- R
- sim
- stype
- strata
- L
- m
- weights
- ran.gen
- mle
- ...

##### code

`statistic`

##### References

There are many references explaining the bootstrap and its variations. Among them are :

Booth, J.G., Hall, P. and Wood, A.T.A. (1993) Balanced importance resampling
for the bootstrap. *Annals of Statistics*, **21**, 286-298.

Davison, A.C. and Hinkley, D.V. (1997)
*Bootstrap Methods and Their Application*. Cambridge University Press.

Davison, A.C., Hinkley, D.V. and Schechtman, E. (1986) Efficient bootstrap
simulation. *Biometrika*, **73**, 555-566.

Efron, B. and Tibshirani, R. (1993) *An Introduction to the Bootstrap*.
Chapman & Hall.

Gleason, J.R. (1988) Algorithms for balanced bootstrap simulations.
*American Statistician*, **42**, 263-266.

Hall, P. (1989) Antithetic resampling for the bootstrap. *Biometrika*,
**73**, 713-724.

Hinkley, D.V. (1988) Bootstrap methods (with Discussion).
*Journal of the Royal Statistical Society, B*, **50**, 312-337, 355-370.

Hinkley, D.V. and Shi, S. (1989) Importance sampling and the nested bootstrap.
*Biometrika*, **76**, 435-446.

Johns M.V. (1988) Importance sampling for bootstrap confidence intervals.
*Journal of the American Statistical Association*, **83**, 709-714.

Noreen, E.W. (1989) *Computer Intensive Methods for Testing Hypotheses*.
John Wiley & Sons.

##### See Also

`boot.array`

, `boot.ci`

, `boot.object`

, `censboot`

, `empinf`

, `jack.after.boot`

, `tilt.boot`

, `tsboot`

##### Examples

```
# usual bootstrap of the ratio of means using the city data
data(city)
ratio <- function(d, w)
sum(d$x * w)/sum(d$u * w)
boot(city, ratio, R=999, stype="w")
# Stratified resampling for the difference of means. In this
# example we will look at the difference of means between the final
# two series in the gravity data.
data(gravity)
diff.means <- function(d, f)
{ n <- nrow(d)
gp1 <- 1:table(as.numeric(d$series))[1]
m1 <- sum(d[gp1,1] * f[gp1])/sum(f[gp1])
m2 <- sum(d[-gp1,1] * f[-gp1])/sum(f[-gp1])
ss1 <- sum(d[gp1,1]^2 * f[gp1]) -
(m1 * m1 * sum(f[gp1]))
ss2 <- sum(d[-gp1,1]^2 * f[-gp1]) -
(m2 * m2 * sum(f[-gp1]))
c(m1-m2, (ss1+ss2)/(sum(f)-2))
}
grav1 <- gravity[as.numeric(gravity[,2])>=7,]
boot(grav1, diff.means, R=999, stype="f", strata=grav1[,2])
# In this example we show the use of boot in a prediction from
# regression based on the nuclear data. This example is taken
# from Example 6.8 of Davison and Hinkley (1997). Notice also
# that two extra arguments to statistic are passed through boot.
data(nuclear)
nuke <- nuclear[,c(1,2,5,7,8,10,11)]
nuke.lm <- glm(log(cost)~date+log(cap)+ne+ ct+log(cum.n)+pt, data=nuke)
nuke.diag <- glm.diag(nuke.lm)
nuke.res <- nuke.diag$res*nuke.diag$sd
nuke.res <- nuke.res-mean(nuke.res)
# We set up a new dataframe with the data, the standardized
# residuals and the fitted values for use in the bootstrap.
nuke.data <- data.frame(nuke,resid=nuke.res,fit=fitted(nuke.lm))
# Now we want a prediction of plant number 32 but at date 73.00
new.data <- data.frame(cost=1, date=73.00, cap=886, ne=0,
ct=0, cum.n=11, pt=1)
new.fit <- predict(nuke.lm, new.data)
nuke.fun <- function(dat, inds, i.pred, fit.pred, x.pred)
{
assign(".inds", inds, envir=.GlobalEnv)
lm.b <- glm(fit+resid[.inds] ~date+log(cap)+ne+ct+
log(cum.n)+pt, data=dat)
pred.b <- predict(lm.b,x.pred)
remove(".inds", envir=.GlobalEnv)
c(coef(lm.b), pred.b-(fit.pred+dat$resid[i.pred]))
}
nuke.boot <- boot(nuke.data, nuke.fun, R=999, m=1,
fit.pred=new.fit, x.pred=new.data)
# The bootstrap prediction error would then be found by
mean(nuke.boot$t[,8]^2)
# Basic bootstrap prediction limits would be
new.fit-sort(nuke.boot$t[,8])[c(975,25)]
# Finally a parametric bootstrap. For this example we shall look
# at the air-conditioning data. In this example our aim is to test
# the hypothesis that the true value of the index is 1 (i.e. that
# the data come from an exponential distribution) against the
# alternative that the data come from a gamma distribution with
# index not equal to 1.
air.fun <- function(data)
{ ybar <- mean(data$hours)
para <- c(log(ybar),mean(log(data$hours)))
ll <- function(k) {
if (k <= 0) out <- 1e200 # not NA
else out <- lgamma(k)-k*(log(k)-1-para[1]+para[2])
out
}
khat <- nlm(ll,ybar^2/var(data$hours))$estimate
c(ybar, khat)
}
air.rg <- function(data, mle)
# Function to generate random exponential variates. mle will contain
# the mean of the original data
{ out <- data
out$hours <- rexp(nrow(out), 1/mle)
out
}
data(aircondit)
air.boot <- boot(aircondit, air.fun, R=999, sim="parametric",
ran.gen=air.rg, mle=mean(aircondit$hours))
# The bootstrap p-value can then be approximated by
sum(abs(air.boot$t[,2]-1) > abs(air.boot$t0[2]-1))/(1+air.boot$R)
```

*Documentation reproduced from package boot, version 1.1-3, License: Unlimited distribution.*