stackingWeights: Stacking model weights

Description

Computes model weights based on a cross-validation-like procedure.

Usage

stackingWeights(object, ..., data, R, p = 0.5)

Value

A matrix with two rows, containing model weights calculated using mean and median.

Arguments

object, ...: two or more fitted glm objects, or a list of such, or an "averaging" object.
data: a data frame containing the variables in the model, used for fitting and prediction.
R: the number of replicates.
p: the proportion of the data to be used as training set. Defaults to 0.5.

Author

Carsten Dormann, Kamil Bartoń

Details

Each model in a set is fitted to the training data: a subset of p * N observations in data. From these models a prediction is produced on the remaining part of data (the test or hold-out data). These hold-out predictions are fitted to the hold-out observations, by optimising the weights by which the models are combined. This process is repeated R times, yielding a distribution of weights for each model (which Smyth & Wolpert (1998) referred to as an ‘empirical Bayesian estimate of posterior model probability’). A mean or median of model weights for each model is taken and re-scaled to sum to one.

References

Wolpert, D. H. 1992 Stacked generalization. Neural Networks 5, 241--259.

Smyth, P. and Wolpert, D. 1998 An Evaluation of Linearly Combining Density Estimators via Stacking. Technical Report No. 98--25. Information and Computer Science Department, University of California, Irvine, CA.

Dormann, C. et al. 2018 Model averaging in ecology: a review of Bayesian, information-theoretic, and tactical approaches for predictive inference. Ecological Monographs 88, 485--504.

Examples

Run this code

#simulated Cement dataset to increase sample size for the training data 
fm0 <- glm(y ~ X1 + X2 + X3 + X4, data = Cement, na.action = na.fail)
dat <- as.data.frame(apply(Cement[, -1], 2, sample, 50, replace = TRUE))
dat$y <- rnorm(nrow(dat), predict(fm0), sigma(fm0))

# global model fitted to training data:
fm <- glm(y ~ X1 + X2 + X3 + X4, data = dat, na.action = na.fail)

# generate a list of *some* subsets of the global model
models <- lapply(dredge(fm, evaluate = FALSE, fixed = "X1", m.lim = c(1, 3)), eval)

wts <- stackingWeights(models, data = dat, R = 10)

ma <- model.avg(models)
Weights(ma) <- wts["mean", ]

predict(ma)

Run the code above in your browser using DataLab