dredge: Automated model selection

Description

Generate a set of models with combinations (subsets) of the terms in the global model, with optional rules for model inclusion.

Usage

dredge(global.model, beta = FALSE, evaluate = TRUE, rank = "AICc", 
    fixed = NULL, m.max = NA, m.min = 0, subset, marg.ex = NULL, 
    trace = FALSE, varying, extra, ct.args = NULL, ...)
## S3 method for class 'model.selection':
print(x, abbrev.names = TRUE, warnings = getOption("warn") != -1L, ...)

Arguments

global.model

a fitted global model object. See Details for a list of supported types.

beta

logical, should standardized coefficients be returned?

evaluate

whether to evaluate and rank the models. If FALSE, a list of model calls is returned.

rank

optional custom rank function (information criterion) to be used instead AICc, e.g. AIC, QAIC or BIC. See Details.

fixed

optional, either a single sided formula or a character vector giving names of terms to be included in all models.

m.max, m.min

optionally the maximum and minimum number of terms in a single model (excluding the intercept), m.max defaults to the number of terms in global.model.

subset

logical expression describing models to keep in the resulting set. See Details.

marg.ex

a character vector specifying names of variables for which NOT to check for marginality restrictions when generating model formulas. If this argument is set to TRUE, all combinations of terms are used (i.e. no checking). If NA

trace

if TRUE, all calls to the fitting function (i.e. updated global.model calls) are printed before actual fitting takes place.

varying

optionally, a named list describing the additional arguments to vary between the generated models. Item names correspond to the arguments, and each item provides a list of choices (i.e.

list(arg1 =
	list(choice1, choice2, ...), ...)

). Compl

extra

optional additional statistics to include in the result, provided as functions, function names or a list of such (best if named or quoted). Similarly as in rank argument, each function must accept fitted model object as an argument and ret

a model.selection object, returned by dredge.

abbrev.names

should printed variable names be abbreviated? (useful with many variables).

warnings

if TRUE, errors and warnings issued during the model fitting are printed below the table (currently, only with pdredge). To permanently remove the warnings, set the object's attribute "warnings" to NULL.

ct.args

optional list of arguments to be passed to coefTable (e.g. dispersion parameter for glm affecting standard errors used in subsequent

...

optional arguments for the rank function. Any can be an expression (of mode call), in which case any x within it will be substituted with a current model.

Value

dredge returns an object of class model.selection, being a data.frame with models' coefficients (or presence/NA for factors), df - number of parameters, log-likelihood, the information criterion value, delta-IC and Akaike weight. Models are ordered by the value of the information criterion specified by rank (lowest on top).
The attribute "calls" is a list containing the model calls used (arranged in the same order as the models). Other attributes: "global" - the global.model object, "rank" - the rank function used, "call" - the matched call, and "warnings" - list of errors and warnings given by the modelling function during the fitting, with model number appended to each. The associated model call can be found with attr(*, "calls")[["i"]], where i is the model number.

encoding

utf-8

Details

Models are fitted one by one through repeated evaluation of modified calls to the global.model (in a similar fashion as with update). This approach, while robust in that it can be applied to a variety of different model object types is not very efficient and may be time-intensive.

Note that the number of combinations grows exponentially with number of predictor variables (latex{$2^{N}$}{2^N}, less when interactions are present, see below). As there can be potentially a large number of models to evaluate, to avoid memory overflow the fitted model objects are not stored in the result. To get (a subset of) the models, use get.models on the object returned by dredge.

For a list of model types that can be used as a global.model see list of supported models. Modelling functions not storing call in their result should be evaluated via the wrapper created by updateable.

Information criterion{ rank is found by a call to match.fun and may be specified as a function or a symbol (e.g. a back-quoted name) or a character string specifying a function to be searched for from the environment of the call to dredge. The function rank must accept model object as its first argument and always return a scalar. Typical choice for rank would be "AIC", "BIC", or "QAIC". }

Interactions{ dredge by default respects marginality constraints, so all possible combinations do not include models containing interactions without their respective main effects and all lower order terms. This behaviour can be altered by marg.ex argument, which can be used to allow for simple nested designs. For example, with global model of form a / (x + z), one would use marg.ex = "a" and fixed = "a". If global.model uses such a formula and marg.ex is missing or NA, it will be adjusted automatically. }

Subsetting{ There are three ways to constrain the resulting set of models: setting limits to the number of terms in a model with m.max and m.min, binding term(s) to all models with fixed, and more complex rules can be applied using argument subset. To be included in the selection table, the model formulation must satisfy all these conditions. subset can take either a form of an expression or a matrix. The latter should be a lower triangular matrix with logical values, where columns and rows correspond to global.model terms. Value subset["a", "b"] == FALSE will exclude any model containing both a and b. Values other than FALSE (or 0) are taken as TRUE.

In the form of expression, the argument subset acts in a similar fashion to that in the function subset for data.frames: model terms can be referred to by name as variables in the expression, with the difference being that they are always logical (i.e. TRUE if a term exists in the model).

The expression can contain any of the global.model terms (getAllTerms(global.model) lists them), as well as names of the varying argument items. Names of global.model terms take precedence when identical to names of varying, so to avoid ambiguity varying variables in subset expression should be enclosed in V() (e.g. subset = V(family) == "Gamma" assuming that varying is something like list(family = c(..., "Gamma"))).

If element names in varying are missing, the elements themselves are used. Call and symbol elements are represented as character values (via deparse), and everything except numeric, logical, character and NULL values is replaced by item numbers (e.g. varying = list(family = list(..., Gamma) should be referred to as subset = V(family) == 2. This can quickly become confusing, therefore it is recommended to use named lists in most cases. demo(dredge.varying) provides examples.

The subset expression can also contain variable `*nvar*` (needs to be backtick-quoted), which is equal to number of terms in the model (not the number of estimated parameters K).

To make inclusion of a variable conditional on presence of some other variable, a function dc (dependency chain) can be used in the subset expression. dc takes any number of variables as arguments, and allows a variable to be included only if all preceding variables are also present (e.g. subset = dc(a, b, c) allows for models of form a, a+band a+b+c but not b, c, b+c or a+c).

subset expression can have a form of an unevaluated call, expression object, or a one sided formula. See Examples. Compound model terms (such as as-is expressions within I() or smooths in gam) should be treated as non-syntactic names and enclosed in back-ticks (e.g. subset = `s(x, k = 2)` || `I(log(x))`, see Quotes). Mind the spacing, names must match exactly the term names in model's formula. To simply keep certain terms in all models, use of argument fixed is more efficient.

subset expression syntax summary{ ll{ a & b indicates that variables a and b must be present (see Logical Operators) V(x) indicates a varying variable x dc(a,b,c,...) dependency chain: a is allowed only if b is present, and b only if c is present, etc. `*nvar*` number of variables } } }

Missing values{ Use of na.action = na.omit (R's default) in global.model should be avoided, as it results with sub-models fitted to different data sets, if there are missing values. Warning is given if it is detected. }

Methods{ There are subset and plot methods, the latter produces a graphical representation of model weights and variable relative importance. Coefficients can be extracted with coef or coefTable. }

Examples

Run this code

# Example from Burnham and Anderson (2002), page 100:
data(Cement)
fm1 <- lm(y ~ ., data = Cement)
dd <- dredge(fm1)
subset(dd, delta < 4)

# Visualize the model selection table:
if(require(graphics))
    plot(dd)


# Model average models with delta AICc < 4
model.avg(dd, subset = delta < 4)

#or as a 95\% confidence set:
model.avg(dd, subset = cumsum(weight) <= .95) # get averaged coefficients

#'Best' model
summary(get.models(dd, 1))[[1]]

# Examples of using 'subset':
# keep only models containing X3
dredge(fm1, subset = ~ X3) # subset as a formula
dredge(fm1, subset = expression(X3)) # subset as expression object
# the same, but more effective:
dredge(fm1, fixed = "X3")
# exclude models containing both X1 and X2 at the same time
dredge(fm1, subset = !(X1 && X2))
# Fit only models containing either X3 or X4 (but not both);
# include X3 only if X2 is present, and X2 only if X1 is present.
dredge(fm1, subset = dc(X1, X2, X3) && xor(X3, X4))
# the same as above, but without using "dc"
dredge(fm1, subset = (X1 | !X2) && (X2 | !X3) && xor(X3, X4))

# Include only models with up to 2 terms (and intercept)
dredge(fm1, m.max = 2)


# Add R^2 and F-statistics, use the 'extra' argument
dredge(fm1, m.max = 1, extra = c("R^2", F = function(x)
    summary(x)$fstatistic[[1]]))

# with summary statistics:
dredge(fm1, m.max = 1, extra = list(
    "R^2", "*" = function(x) {
        s <- summary(x)
        c(Rsq = s$r.squared, adjRsq = s$adj.r.squared,
            F = s$fstatistic[[1]])
    })
)


# Add other information criterions (but rank with AICc):
# there is no BIC in R < 2.13.0, so need to add it:
if(!exists("BIC", mode = "function"))
    BIC <- function(object, ...)
        AIC(object, k = log(length(resid(object))))
dredge(fm1, m.max = 1, extra = alist(AIC, BIC, ICOMP, Cp))

Run the code above in your browser using DataLab