Arguments
data
array of data, if a 2D array then each row is considered as
one multivariate observation
statistic
function, when sim is set to parametric,
the first argument to statistic must be the data. For
each replicate a simulated dataset returned by ran.gen
will be passed. In all other cases, statistic must take
at least two arguments. The first argument passed will
always be the original data. The second will be a vector
of indices, frequencies or weights which define the
bootstrap sample.
R
number of bootstrap replicates
sim
string, indicates the type of simulation. The default value is
"ordinary". Other possible values are parametric, balanced,
permutation, and antithetic. Importance resampling is
specified by including importance weights; the type of
importance resampling must still be specified but may only be
ordinary or balanced in this case.
stype
string, indicates what the second argument of statistic
represents. The default value is i for indices. Other
possible values are f for frequencies and w for weights.
It is not used when sim is set to parametric.
strata
vector of integer, specifies the strata for multi-sample
problems. This may be specified for any simulation, but is
ignored when sim is set to parametric. When strata is
supplied for a nonparametric bootstrap, the simulations are
done within the specified strata.
L
vector of influence values evaluated at the observations. This is
used only when sim is set to antithetic. If not supplied, they
are calculated through a call to empinf. This will use the
infinitesimal jackknife provided that stype is set to w
otherwise the usual jackknife is used.
m
the number of predictions which are to be made at each bootstrap
replicate. This is most useful for (generalized) linear models.
This can only be used when sim is ordinary. m will usually be
a single integer but, if there are strata, it may be a vector
with length equal to the number of strata, specifying how many
of the errors for prediction should come from each strata. The
actual predictions should be returned as the final part of the
output of statistic, which should also take an argument giving
the vector of indices of the errors to be used for the
predictions.
weights
array of importance weights. If a vector then it should
have as many elements as there are observations in the
input data. When simulation from more than one set of
weights is required, weights should be a matrix where each
row of the matrix is one set of importance weights. If
weights is a matrix then the number of bootstrap replicates
R must be a vector of length nrow(weights). This parameter
is ignored if sim is not set to ordinary or balanced.
ran.gen
function, used only when sim is set to parametric. It
describes how random values are to be generated. It should
be a function of two arguments. The first argument should
be the observed data and the second argument consists of
any other information needed (e.g. parameter estimates).
The second argument may be a list, allowing any number of
items to be passed to ran.gen. The returned value should be
a simulated data set of the same form as the observed data
which will be passed to statistic to get a bootstrap
replicate. It is important that the returned value be of
the same shape and type as the original dataset. If ran.gen
is not specified, the default is a function which returns
the original input data in which case all simulation should
be included as part of statistic. Setting sim to
parametric and using a suitable ran.gen allows the user
to implement any types of nonparametric resampling which
are not supported directly.
mle
secong argument to ran.gen, typically these will be maximum
likelihood estimates of the parameters. For efficiency mle is
often a list containing all of the objects needed by ran.gen
which can be calculated using the original data set only.
simple
boolean, can only be set to TRUE if sim is set to
ordinary, stype is set to I and n is set to 0. Otherwise
it is ignored and generates a warning. By default a n by R
index array is created which can be large. If simple is set
to TRUE, this is avoided by sampling separately for each
replication, which is slower but uses less memory.
...
other named arguments for statistic which are passed unchanged
each time.