model.avg: Model averaging

Description

Model averaging based on an information criterion.

Usage

model.avg(m1, ..., beta = FALSE, method = c("0", "NA"), rank = NULL,
	rank.args = NULL, alpha = 0.05, revised.var = TRUE)
## S3 method for class 'averaging':
coef(object, ...)
## S3 method for class 'averaging':
predict(object, newdata, se.fit = NULL, interval = NULL,
	type=c("link", "response"), ...)

Arguments

A fitted model object or a list of such objects. See Details.

beta

Logical, should standardized coefficients be returned?

method

The method of averaging parameter estimators that are not common for all the models. Either "0" (default) or "NA". See Details.

rank

Optional, custom rank function (information criterion) to use instead of AICc, e.g. QAIC or BIC, may be omitted if m1 is a model list returned by get.models. See Details

rank.args

Optional list of arguments for the rank function. If one is an expression, an x within it is substituted with a current model.

alpha

Significance level for calculatinq confidence intervals.

object

An object returned by model.avg.

newdata

An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used.

se.fit

logical, indicates if standard errors should be returned. This has any effect only if the predict methods for each of the component models support it.

interval

Currently not used.

revised.var

Logical, indicating whether to use revised formula for standard errors. See par.avg.

type

Predictions on response scale are only possible if all component models use the same family. See predict.glm

...

for model.avg - more fitted model objects, for predict - arguments to be passed to respective predict method (e.g. level for lme model).

Value

An object of class averaging with following elements:
summarya data.frame with deviance, AICc, Delta and weights for the component models.
coefficients, variancematrices of component models' coefficients and their variances
variable.codesnames of the variables with numerical codes used in the summary
avg.modelthe averaged model summary, (data.frame containing averaged coefficients, unconditional standard error, adjusted SE, and confidence intervals)
importancethe relative importance of variables
beta(logical) were standardized coefficients used?
term.namescharacter vector giving names of all terms in the model
residualsthe residuals (response minus fitted values).
x, formulathe model matrix and formula analogical to those that would be used in a single model.
methodhow the missing terms were handled ("NA" or "0").
callthe matched call.

encoding

utf-8

Details

model.avg has been tested to work with the following model classes:

lm,glm
gam(mgcv)
lme,gls(nlme)
lmer(lme4)
rlm,glm.nb(MASS)
multinom(nnet)
sarlm,spautolm(spdep)
coxph(survival)

Other model types are also likely to be supported, in particular those inheriting from one of the above classes. See Details section of the Miscellaneous page to see how to provide support for other types of models.

With method = "0" (default) all predictors are averaged as if they were present in all models in the set, and the value of parameter estimate is taken to be 0 if it is not present in a particular model. If method = "NA", the predictors are averaged only over the models in which they appear.

rank is found by a call to match.fun and typically is specified as a function or a symbol (e.g. a back-quoted name) or a character string specifying a function to be searched for from the environment of the call to lapply. rank must be a function able to accept model as a first argument and must always return a scalar.

predict.averaging supports method = "NA" only for linear, fixed effect models. In other cases (e.g. nonlinear or mixed models), prediction is obtained using brute force, i.e. by calling predict on each component model and weighted averaging the results, which is equivalent to assuming that missing coefficients equal zero (method = "0").

Besides predict and coef, other generic methods such as formula, residuals and vcov are supported. logLik method returns a list of logLik objects for the component models.

References

Burnham, K. P. and Anderson, D. R (2002) Model selection and multimodel inference: a practical information-theoretic approach. 2nd ed.

Examples

Run this code

require(graphics)

# Example from Burnham and Anderson (2002), page 100:
data(Cement)
lm1 <- lm(y ~ ., data = Cement)
dd <- dredge(lm1)
dd

#models with delta.aicc < 4
model.avg(get.models(dd, subset = delta < 4)) # get averaged coefficients

#or as a 95\% confidence set:
top.models <- get.models(dd, cumsum(weight) <= .95)

model.avg(top.models) # get averaged coefficients

#topmost model:
top.models[[1]]

# using BIC (Schwarz's Bayesian criterion) to rank the models
BIC <- function(x) AIC(x, k=log(length(residuals(x))))
mav <- model.avg(top.models, rank=BIC)


# Predicted values
nseq <- function(x, len=length(x)) seq(min(x, na.rm=TRUE),
	max(x, na.rm=TRUE),	length=len)

# New predictors: X1 along the range of original data, other variables held
#	constant at their means
newdata <- as.data.frame(lapply(lapply(Cement[1:5], mean), rep, 25))
newdata$X1 <- nseq(Cement$X1, nrow(newdata))

# Predictions from each of the models in a set:
pred <- sapply(top.models, predict, newdata=newdata)
# Add predictions from the models averaged using two methods:
pred <- cbind(pred,
	averaged.0=predict(model.avg(top.models, method="0"), newdata),
	averaged.NA=predict(model.avg(top.models, method="NA"), newdata)
	)

matplot(x=newdata$X1, y=pred, type="l", lwd=c(rep(1,ncol(pred)-2), 2, 2),
	xlab="X1", ylab="y")

legend("topleft",
	legend=c(lapply(top.models, formula),
		paste("Averaged model (method=", c("0", "NA"), ")", sep="")),
	col=1:6, lty=1:5, lwd=c(rep(1,ncol(pred)-2), 2, 2), cex = .75
	)


# Example with gam models (based on "example(gam)")
require(mgcv)
dat <- gamSim(1, n = 500, dist="poisson", scale=0.1)

gam1 <- gam(y ~ s(x0) + s(x1) + s(x2) +  s(x3) + (x1+x2+x3)^2,
	family = poisson, data = dat, method = "REML")

cat(dQuote(getAllTerms(gam1)), "\n")

# include only models with smooth OR linear term (but not both)
# for each variable:
dd <- dredge(gam1, subset=xor(`s(x1)`, x1) & xor(`s(x2)`, x2) & xor(`s(x3)`, x3))
# ...this may take a while.

subset(dd, cumsum(weight) < .95)

top.models <- get.models(dd, cumsum(weight) <= .95)

newdata <- as.data.frame(lapply(lapply(dat, mean), rep, 50))
newdata$x1 <- nseq(dat$x1, nrow(newdata))
pred <- cbind(
	sapply(top.models, predict, newdata=newdata),
	averaged=predict(model.avg(top.models), newdata)
)

matplot(x=newdata$x1, y=pred, type="l", lwd=c(rep(1,ncol(pred)-2), 2, 2),
	xlab="x1", ylab="y")

Run the code above in your browser using DataLab