model.avg: Model averaging

Description

Model averaging based on an information criterion.

Usage

model.avg(m1, ..., beta = FALSE, method = c("0", "NA"), rank = NULL,
	rank.args = NULL, alpha = 0.05)
## S3 method for class 'averaging':
coef(object, ...)
## S3 method for class 'averaging':
predict(object, newdata, se.fit = NULL, interval = NULL,
	type=NULL, ...)

Arguments

A fitted model object or a list of such objects.

beta

Logical, should standardized coefficients be returned rather than normal ones?

method

If set to 0 (default), terms missing in one model are assumed to be 0's, otherwise they are omitted from the weighted average. See Details.

rank

Custom rank function (information criterion) to use instead of AICc, e.g. QAIC or BIC, may be omitted if m1 is a list returned by dredge. See Details.

rank.args

Optional list of arguments for the rank function. If one is an expression, an x within it is substituted with a current model.

alpha

Significance level for calculatinq confidence intervals.

object

An object returned by model.avg.

newdata

An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used.

se.fit,interval

Currently not used.

type

Ignored. Only predictions on the link scale are allowed. Warning is given if user tries something else here.

...

for model.avg - more fitted model objects, for predict - arguments to be passed to respective predict method

Value

Ann object of class averaging, which is a list with elements:
summaryModel table with deviance, AICc, Delta and weight.
coefficientsthe model coefficients
variancevariance of coefficients
avg.modelaveraged model summary (data.frame with columns: coef - averaged coefficients, var - unconditional variance estimator, ase - adjusted standard error estimator, lci, uci - unconditional confidence intervals)
relative.importancerelative variable importances
variable.codesVariable names with numerical codes used in the summary
relative.importanceRelative importance of variables
weights
beta(logical) were standardized coefficients used?
modelthe model matrix, analogical to one that would be used in a single model.
residualsthe residuals (response minus fitted values).

encoding

utf-8

Details

rank is found by a call to match.fun and typically is specified as a function or a symbol (e.g. a backquoted name) or a character string specifying a function to be searched for from the environment of the call to lapply.

Function rank must be able to accept model as a first argument and must always return a scalar.

predict.averaging supports method="NA" only for linear, fixed effect models. In other cases (e.g. nonlinear or mixed models), prediction is obtained using brute force, i.e. by calling predict on each component model and weighted averaging the results, which is equivalent to assuming that missing coefficients equal zero (method="0").

Apart from predict and coef, other default methods, such as formula and residuals may be used.

References

Burnham, K. P. and Anderson, D. R (2002) Model selection and multimodel inference: a practical information-theoretic approach. 2nd ed.

Examples

Run this code

# Example from Burnham and Anderson (2002), page 100:
data(Cement)
lm1 <- lm(y ~ ., data = Cement)
dd <- dredge(lm1)
dd

#models with delta.aicc < 4
model.avg(get.models(dd, subset = delta < 4)) # get averaged coefficients

#or as a 95\% confidence set:
top.models <- get.models(dd, cumsum(weight) <= .95)

model.avg(top.models) # get averaged coefficients

#topmost model:
top.models[[1]]

# using BIC (Schwarz's Bayesian criterion) to rank the models
BIC <- function(x) AIC(x, k=log(length(residuals(x))))
mav <- model.avg(top.models, rank=BIC)


# Predicted values
nseq <- function(x, len=length(x)) seq(min(x, na.rm=TRUE), max(x, na.rm=TRUE),
	length=len)

# New predictors: X1 along the range of original data, other variables held
#	constant at their means
newdata <- as.data.frame(lapply(lapply(Cement[1:5], mean), rep, 25))
newdata$X1 <- nseq(Cement$X1, nrow(newdata))

# Predictions from each of the models in a set:
pred <- sapply(top.models, predict, newdata=newdata)
# Add predictions from the models averaged using two methods:
pred <- cbind(pred,
	averaged.0=predict(model.avg(top.models, method="0"), newdata),
	averaged.NA=predict(model.avg(top.models, method="NA"), newdata)
	)

matplot(x=newdata$X1, y=pred, type="l", lwd=c(rep(1,ncol(pred)-2), 2, 2),
	xlab="X1", ylab="y")

legend("topleft",
	legend=c(lapply(top.models, formula),
		paste("Averaged model (method=", c("0", "NA"), ")", sep="")),
	col=1:6, lty=1:5, lwd=c(rep(1,ncol(pred)-2), 2, 2), cex = .75
	)


# Example with gam models (based on "example(gam)")
require(mgcv)
dat <- gamSim(1, n = 500, dist="poisson", scale=0.1)

gam1 <- gam(y ~ s(x0) + s(x1) + s(x2) +  s(x3) + (x1+x2+x3)^2,
	family = poisson, data = dat, method = "REML")

cat(dQuote(getAllTerms(gam1)), "\n")

# include only models with smooth OR linear term (but not both) for each variable:
dd <- dredge(gam1, subset=(!`s(x1)` | !x1) & (!`s(x2)` | !x2) & (!`s(x3)` | !x3))
# ...this may take a while.

subset(dd, cumsum(weight) < .95)

top.models <- get.models(dd, cumsum(weight) <= .95)

newdata <- as.data.frame(lapply(lapply(dat, mean), rep, 50))
newdata$x1 <- nseq(dat$x1, nrow(newdata))
pred <- cbind(
	sapply(top.models, predict, newdata=newdata),
	averaged=predict(model.avg(top.models), newdata)
)

matplot(x=newdata$x1, y=pred, type="l", lwd=c(rep(1,ncol(pred)-2), 2, 2),
	xlab="x1", ylab="y")

Run the code above in your browser using DataLab

Last chance! 50% off unlimited learning