dredge: Automated model selection

Description

Generate a set of models with combinations (subsets) of the terms in the global model, with optional rules for inclusion.

Usage

dredge(global.model, beta = FALSE, evaluate = TRUE,
    rank = "AICc", fixed = NULL, m.max = NA, m.min = 0, subset,
    marg.ex = NULL, trace = FALSE, varying, extra, ...)

## S3 method for class 'model.selection':
print(x, abbrev.names = TRUE, warnings = getOption("warn") != -1L,
    ...)

Arguments

Value

dredge returns an object of class model.selection, being a data.frame with models' coefficients (or presence/NA for factors), df - number of parameters, log-likelihood, the information criterion value, delta-IC and Akaike weight. Models are ordered by the value of the information criterion specified by rank (lowest on top).
The attribute "calls" is a list containing the model calls used (arranged in the same order as the models). Other attributes: "global" - the global.model object, "rank" - the rank function used, "call" - the matched call, and "warnings" - list of errors and warnings given by the modelling function during the fitting, with model number appended to each. The associated model call can be found with attr(*, "calls")[["i"]], where i is the model number.

encoding

utf-8

Details

Fitted model objects that can be used as a global.model include ones returned by lm, glm (package stats); gam, gamm (mgcv); gamm4 (gamm4); lme, gls (nlme); lmer (lme4); rlm, glm.nb, polr (MASS); multinom (nnet); sarlm, spautolm (spdep); glmmML (glmmML); coxph, survreg (survival); coxme, lmekin (coxme); rq (quantreg); and model classes from package unmarked. gamm and gamm4 should be evaluated via the wrapper MuMIn::gamm.

Models are fitted one by one through repeated evaluation of modified calls to the global.model (in a similar fashion as with update). This method, while robust in that it can be applied to a variety of different model object types is not very efficient, and may be time-intensive.

Note that the number of combinations grows exponentially with number of predictor variables (latex{$2^{N}$}{2^N}). As there can be potentially a large number of models to evaluate, to avoid memory overflow the fitted model objects are not stored in the result. To get (a subset of) the models, use get.models on the object returned by dredge.

Handling interactions, dredge respects marginality constraints, so all possible combinations do not include models containing interactions without their respective main effects. This behaviour can be altered by marg.ex argument, which can be used to allow for simple nested designs. For example, with global model of form a / (x + z), use marg.ex = "a" and fixed = "a".

rank is found by a call to match.fun and may be specified as a function or a symbol (e.g. a back-quoted name) or a character string specifying a function to be searched for from the environment of the call to dredge. Function rank must be able to accept model as a first argument and must always return a scalar. Typical choice for rank would be "AIC", "QAIC" or "BIC" (stats or nlme).

The argument subset acts in a similar fashion to that in the function subset for data.frames: the model terms can be referred to by name as variables in the expression, with the difference that they are always logical (i.e. TRUE if a term exists in the model). The expression can contain any of the global.model terms (use getAllTerms(global.model) to list them). It can have a form of an unevaluated call, expression object, or a one sided formula. See Examples. Compound model terms (such as as-is expressions within I() or the smooths in gam) should be treated as non-syntactic names and enclosed in back-ticks (see Quotes). Mind the spacing, names must match exactly the term names in model's formula. To simply keep certain terms in all models, use of fixed is preferred.

Use of na.action = na.omit (R's default) in global.model should be avoided, as it results with sub-models fitted to different data sets, if there are missing values. In versions >= 0.13.17 a warning is given in such a case.

Examples

Run this code

# Example from Burnham and Anderson (2002), page 100:
data(Cement)
fm1 <- lm(y ~ ., data = Cement)
dd <- dredge(fm1)
subset(dd, delta < 4)

# Visualize the model selection table:
if(require(graphics))
plot(dd)


# Model average models with delta AICc < 4
model.avg(dd, subset = delta < 4)

#or as a 95\% confidence set:
model.avg(dd, subset = cumsum(weight) <= .95) # get averaged coefficients

#'Best' model
summary(get.models(dd, 1))[[1]]

# Examples of using 'subset':
# exclude models containing both X1 and X2
dredge(fm1, subset = !(X1 & X2))
# keep only models containing X3
dredge(fm1, subset = ~ X3) # subset as a formula
dredge(fm1, subset = expression(X3)) # subset as expression object
# the same, but more effective:
dredge(fm1, fixed = "X3")

#Reduce the number of generated models, by including only those with
# up to 2 terms (and intercept)
dredge(fm1, m.max = 2)


# Add R^2 and F-statistics, use the 'extra' argument
dredge(fm1, m.max = 1, extra = c("R^2", F = function(x)
    summary(x)$fstatistic[[1]]))

# with summary statistics:
dredge(fm1, m.max = 1, extra = list(
    "R^2", "*" = function(x) {
        s <- summary(x)
        c(Rsq = s$r.squared, adjRsq = s$adj.r.squared,
            F = s$fstatistic[[1]])
    })
)


# with other information criterions:

# there is no BIC in R < 2.13.0, so need to add it:
if(!exists("BIC", mode = "function"))
    BIC <- function(object, ...)
        AIC(object, k = log(length(resid(object))))

dredge(fm1, m.max = 1, extra = alist(AIC, BIC, ICOMP, Cp))

Run the code above in your browser using DataLab