Learn R Programming

MuMIn (version 1.0.0)

model.avg: Model averaging

Description

Model averaging based on an information criterion.

Usage

model.avg(m1, ..., beta = FALSE, method = c("0", "NA"), rank = NULL,
	rank.args = NULL, alpha = 0.05, revised.var = TRUE)

## S3 method for class 'averaging': coef(object, ...)

## S3 method for class 'averaging': predict(object, newdata, se.fit = NULL, interval = NULL, type=c("link", "response"), ...)

Arguments

m1
A fitted model object or a list of such objects. See Details.
beta
Logical, should standardized coefficients be returned?
method
The method of averaging parameter estimators that are not common for all the models. Either "0" (default) or "NA". See Details.
rank
Optional, custom rank function (information criterion) to use instead of AICc, e.g. QAIC or BIC, may be omitted if m1 is a model list returned by get.models. See Details
rank.args
Optional list of arguments for the rank function. If one is an expression, an x within it is substituted with a current model.
alpha
Significance level for calculatinq confidence intervals.
object
An object returned by model.avg.
newdata
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used.
se.fit
logical, indicates if standard errors should be returned. This has any effect only if the predict methods for each of the component models support it.
interval
Currently not used.
revised.var
Logical, indicating whether to use revised formula for standard errors. See par.avg.
type
Predictions on response scale are only possible if all component models use the same family. See predict.glm
...
for model.avg - more fitted model objects, for predict - arguments to be passed to respective predict method (e.g. level for lme model).

Value

  • An object of class averaging with following elements:
  • summarya data.frame with deviance, AICc, Delta and weights for the component models.
  • coefficients, variancematrices of component models' coefficients and their variances
  • variable.codesnames of the variables with numerical codes used in the summary
  • avg.modelthe averaged model summary, (data.frame containing averaged coefficients, unconditional standard error, adjusted SE, and confidence intervals)
  • importancethe relative importance of variables
  • beta(logical) were standardized coefficients used?
  • term.namescharacter vector giving names of all terms in the model
  • residualsthe residuals (response minus fitted values).
  • x, formulathe model matrix and formula analogical to those that would be used in a single model.
  • methodhow the missing terms were handled ("NA" or "0").
  • callthe matched call.

encoding

utf-8

Details

model.avg has been tested to work with the following model classes:
  • lm,glm
  • gam(mgcv)
  • lme,gls(nlme)
  • lmer(lme4)
  • rlm,glm.nb(MASS)
  • multinom(nnet)
  • sarlm,spautolm(spdep)
  • coxph(survival)
Other model types are also likely to be supported, in particular those inheriting from one of the above classes. See Details section of the Miscellaneous page to see how to provide support for other types of models.

With method = "0" (default) all predictors are averaged as if they were present in all models in the set, and the value of parameter estimate is taken to be 0 if it is not present in a particular model. If method = "NA", the predictors are averaged only over the models in which they appear.

rank is found by a call to match.fun and typically is specified as a function or a symbol (e.g. a back-quoted name) or a character string specifying a function to be searched for from the environment of the call to lapply. rank must be a function able to accept model as a first argument and must always return a scalar.

predict.averaging supports method = "NA" only for linear, fixed effect models. In other cases (e.g. nonlinear or mixed models), prediction is obtained using brute force, i.e. by calling predict on each component model and weighted averaging the results, which is equivalent to assuming that missing coefficients equal zero (method = "0").

Besides predict and coef, other generic methods such as formula, residuals and vcov are supported. logLik method returns a list of logLik objects for the component models.

References

Burnham, K. P. and Anderson, D. R (2002) Model selection and multimodel inference: a practical information-theoretic approach. 2nd ed.

See Also

See par.avg for details of averaged model calculation.

dredge, get.models. QAIC has examples of using custom rank function.

modavg in package AICcmodavg, and coef.glmulti in package glmulti also perform model averaging.

Examples

Run this code
require(graphics)

# Example from Burnham and Anderson (2002), page 100:
data(Cement)
lm1 <- lm(y ~ ., data = Cement)
dd <- dredge(lm1)
dd

#models with delta.aicc < 4
model.avg(get.models(dd, subset = delta < 4)) # get averaged coefficients

#or as a 95\% confidence set:
top.models <- get.models(dd, cumsum(weight) <= .95)

model.avg(top.models) # get averaged coefficients

#topmost model:
top.models[[1]]

# using BIC (Schwarz's Bayesian criterion) to rank the models
BIC <- function(x) AIC(x, k=log(length(residuals(x))))
mav <- model.avg(top.models, rank=BIC)


# Predicted values
nseq <- function(x, len=length(x)) seq(min(x, na.rm=TRUE),
	max(x, na.rm=TRUE),	length=len)

# New predictors: X1 along the range of original data, other variables held
#	constant at their means
newdata <- as.data.frame(lapply(lapply(Cement[1:5], mean), rep, 25))
newdata$X1 <- nseq(Cement$X1, nrow(newdata))

# Predictions from each of the models in a set:
pred <- sapply(top.models, predict, newdata=newdata)
# Add predictions from the models averaged using two methods:
pred <- cbind(pred,
	averaged.0=predict(model.avg(top.models, method="0"), newdata),
	averaged.NA=predict(model.avg(top.models, method="NA"), newdata)
	)

matplot(x=newdata$X1, y=pred, type="l", lwd=c(rep(1,ncol(pred)-2), 2, 2),
	xlab="X1", ylab="y")

legend("topleft",
	legend=c(lapply(top.models, formula),
		paste("Averaged model (method=", c("0", "NA"), ")", sep="")),
	col=1:6, lty=1:5, lwd=c(rep(1,ncol(pred)-2), 2, 2), cex = .75
	)


# Example with gam models (based on "example(gam)")
require(mgcv)
dat <- gamSim(1, n = 500, dist="poisson", scale=0.1)

gam1 <- gam(y ~ s(x0) + s(x1) + s(x2) +  s(x3) + (x1+x2+x3)^2,
	family = poisson, data = dat, method = "REML")

cat(dQuote(getAllTerms(gam1)), "\n")

# include only models with smooth OR linear term (but not both)
# for each variable:
dd <- dredge(gam1, subset=xor(`s(x1)`, x1) & xor(`s(x2)`, x2) & xor(`s(x3)`, x3))
# ...this may take a while.

subset(dd, cumsum(weight) < .95)

top.models <- get.models(dd, cumsum(weight) <= .95)

newdata <- as.data.frame(lapply(lapply(dat, mean), rep, 50))
newdata$x1 <- nseq(dat$x1, nrow(newdata))
pred <- cbind(
	sapply(top.models, predict, newdata=newdata),
	averaged=predict(model.avg(top.models), newdata)
)

matplot(x=newdata$x1, y=pred, type="l", lwd=c(rep(1,ncol(pred)-2), 2, 2),
	xlab="x1", ylab="y")

Run the code above in your browser using DataLab