bootstrap: Bootstrap estimation and errors

Description

Function to calculate bootstrap statistics for transfer function models such as bootstrap estimates, model RMSEP, sample specific errors for predictions and summary statistics such as bias and $R^2$ between oberved and estimated environment.

Usage

bootstrap(object, ...)
## S3 method for class 'default':
bootstrap(object, \dots)
## S3 method for class 'mat':
bootstrap(object, newdata, newenv, k,
          weighted = FALSE, n.boot = 1000, \dots)

Arguments

object

an R object for which bootstrap statistics are to be generated. Only objects of class "mat" currently supported.

newdata

a data frame containing samples for which bootstrap predictions and sample specific errors are to be generated. May be missing --- See Details. "newdata" must have the same number of columns as the training set data.

newenv

a vector containing environmental data for samples in "newdata". Used to calculate full suite of errors for new data such as a test set with known environmental values. May be missing --- See Details. "newenv" must ha

numeric; how many modern analogues to use to generate the bootstrap statistics and, if requested, the predictions.

weighted

logical; should the weighted mean of the environment for the "k" modern analogues be used instead of the mean?

n.boot

Number of bootstrap samples to take.

...

arguments passed to other methods.

Value

A large object is returned with some or all of the following depending on whether newdata and newenv are supplied or not.
observedvector of observed environmental values.
modela list containing the apparent or non-bootstrapped estimates for the training set. With the following components: estimated{estimated values for "y", the environment.} residuals{model residuals.} r.squared{Apparent $R^2$ between observed and estimated values of "y".} avg.bias{Average bias of the model residuals.} max.bias{Maximum bias of the model residuals.} rmse{Apparent error (RMSE) for the model.} k{numeric; indicating the size of model used in estimates and predictions.}
bootstrapa list containing the bootstrap estimates for the training set. With the following components: estimated{Bootstrap estimates for "y".} residuals{Bootstrap residuals for "y".} r.squared{Bootstrap derived $R^2$ between observed and estimated values of "y".} avg.bias{Average bias of the bootstrap derived model residuals.} max.bias{Maximum bias of the bootstrap derived model residuals.} rmsep{Bootstrap derived RMSEP for the model.} s1{Bootstrap derived S1 error component for the model.} s2{Bootstrap derived S2 error component for the model.} k{numeric; indicating the size of model used in estimates and predictions.}
sample.errorsa list containing the bootstrap-derived sample specific errors for the training set. With the following components: rmsep{Bootstrap derived RMSEP for the training set samples.} s1{Bootstrap derived S1 error component for training set samples.} s2{Bootstrap derived S2 error component for training set samples.}
weightedlogical; whether the weighted mean was used instead of the mean of the environment for k-closest analogues.
autological; whether "k" was choosen automatically or user-selected.
n.bootnumeric; the number of bootstrap samples taken.
callthe matched call.
callmodel type.
predictionsa list containing the apparent and bootstrap-derived estimates for the new data, with the following components: observed{the observed values for the new samples --- only if newenv is provided.} model{a list containing the apparent or non-bootstrapped estimates for the new samples. A list with the same components as apparent, above. } bootstrap{a list containing the bootstrap estimates for the new samples, with some or all of the same components as bootstrap, above.} sample.errors{a list containing the bootstrap-derived sample specific errors for the new samples, with some or all of the same components as sample.errors, above.}

Details

bootstrap is a fairly flexible function, and can be called with or without arguments newdata and newenv.

If called with only object specified, then bootstrap estimates for the training set data are returned. In this case, the returned object will not include component predictions.

If called with both object and newdata, then in addition to the above, bootstrap estimates for the new samples are also calculated and returned. In this case, component predictions will contain the apparent and bootstrap derived predictions and sample-specific errors for the new samples.

If called with object, newdata and newenv, then the full bootstrap object is returned (as described in the Value section below). With environmental data now available for the new samples, residuals, RMSE(P) and $R^2$ and bias statistics can be calculated.

The individual components of predictions are the same as those described in the components relating to the training set data. For example, returned.object$predictions$bootstrap contains the components as returned.object$bootstrap.

It is not usual for environmental data to be available for the new samples for which predictions are required. In normal palaeolimnological studies, it is more likely that newenv will not be available as we are dealing with sediment core samples from the past for which environmental data are not available. However, if sufficient training set samples are available to justify producing a training and a test set, then newenv will be available, and bootstrap can accomodate this extra information and calculate apparent and bootstrap estimates for the test set, allowing an independent assessment of the RMSEP of the model to be performed.

References

Birks, H.J.B., Line, J.M., Juggins, S., Stevenson, A.C. and ter Braak, C.J.F. (1990). Diatoms and pH reconstruction. Philosophical Transactions of the Royal Society of London; Series B, 327; 263--278.

Examples

Run this code

## continue the RLGH and SWAP example from ?join
example(join)

## fit the MAT model using the squared chord distance measure
swap.mat <- mat(swapdiat, swappH, method = "SQchord")

## bootstrap training set
swap.boot <- bootstrap(swap.mat, n.boot = 100)
swap.boot
summary(swap.boot)

Run the code above in your browser using DataLab