glmulti: Automated model selection and multimodel inference with (G)LMs

Description

glmulti finds what are the n best models (the confidence set of models) among all possible models (the candidate set, as specified by the user). Models are fitted with the specified fitting function (default is glm) and are ranked with the specified Information Criterion (default is aicc). The best models are found either through exhaustive screening of the candidates, or using a genetic algorithm, which allows very large candidate sets to be adressed. The output can be used for model selection, variable selection, and multimodel inference.

Usage

#  glmulti S4 generic 
glmulti(y, xr, data, exclude = c(), name = "glmulti.analysis", intercept = TRUE, marginality = FALSE, bunch=30, chunk = 1, chunks = 1,
		level = 2, minsize = 0, maxsize = -1, minK = 0, maxK = -1, method = "h", crit = "aicc", confsetsize = 100, popsize = 100, mutrate = 10^-3, sexrate = 0.1, imm = 0.3, plotty = TRUE, report = TRUE, deltaM = 0.05, deltaB = 0.05, conseq = 5, fitfunction = "glm", resumefile = "id", ...)

Arguments

A formula, character string, or fitted model (of class lm or glm) specifying the response variable and the terms (main effects and/or interactions) to be used in the candidate models (e.g. height~age*sex+mass). Alternatively, a character string naming t

An optional character array specifying the variables (categorical or quantitative) to be used as predictors, e.g. c("age", "height" , "mass")

exclude

Optional character vector naming terms (main effects or interactions) to be excluded from the candidate models, e.g. c("mass:height")

intercept

Whether to include an intercept in the candidate models or not.

level

If 1, only main effects (terms of order 1) are used to build the candidate set. If 2, pairwise interactions are also used (higher order interactions are currently ignored)

data

A data.frame containing the data. If not specified, glmulti will try to find the data in the environment of the formula, from the fitted model passed as y argument, or from the global environment.

name

The name of this glmulti analysis. Optional.

marginality

Whether to apply the marginality rule or not. If TRUE, only marginal models will be considered.

minsize

This sets a constraint on candidate models. Minimal number of TERMS (main effects or interactions) to be included in candidate models (negative = no constraint)

maxsize

This sets a constraint on candidate models. Maximal number of TERMS to be included in candidate models (negative = no constraint)

minK

This sets a constraint on candidate models. Minimal complexity of candidate models (negative = no constraint)

maxK

This sets a constraint on candidate models. Maximal complexity of candidate models (negative = no constraint)

method

The method to be used to explore the candidate set of models. If "h" an exhaustive screening is undertaken. If "g" the genetic algorithm is employed (recommended for large candidate sets). If "l", a very fast exhaustive branch-and-bound algorithm is us

crit

The Information Criterion to be used. This should be a function that accepts a fitted model as first argument. Default is the small-sampled corrected Akaike IC (aicc). Other provided functions are the Bayes IC (bic), the original AIC (

fitfunction

The fitting function to be used. Any function similar to glm can be used. See Examples

confsetsize

The number of models to be looked for, i.e. the size of the returned confidence set.

plotty

Whether to plot the progress of the IC profile when running.

report

Whether to report about the progress at run time.

bunch

The number of model formulas to be returned (to be fitted) at each call to the enumerator. Exhaustive screening only.

chunk

When using an exhaustive screening approach, it can be splitted in several parts to take advantage of multiple CPUs. chunk is an integer specifying which part the current call should perform.

chunks

When splitting an exhaustive screening, the total number of parts the task should be divided into. For example, with a quad-core processor, 4 may be useful. Use consensus to bring back the pieces into a single object.

popsize

The population size for the genetic algorithm

mutrate

The per locus (i.e. per term) mutation rate for genetic algorithm, between 0 and 1

sexrate

The rate of sexual reproduction for the genetic algorithm, between 0 and 1

imm

The rate of immigration for the genetic algorithm, between 0 and 1

deltaM

The target change in mean IC (defines the stop rules for the genetic algorithm)

deltaB

The target change in best IC (defines the stop rules for the genetic algorithm)

conseq

The target successive number of times with no improvement (i.e. target changes have been attained) (defines the stop rule for the GA). The greater it is, the more stringent the stop rule.

resumefile

When resuming an analysis (method="r"), the name of the files from which to resume. Default uses the same as name

...

Further arguments to be passed to the fitting function. E.g. maxit=50 or family=binomial

Value

An object of class glmulti is returned. It is a S4 object with several slots containing relevant data for model selection and beyond. Several standard S3 functions are provided to help access the content of this object. Several glmulti objects can be shrunk to one using the function consensus. This is useful to get the best of several replicates (of the genetic algorithm) or to bring together the different parts of a splitted exhaustive screening. When running a genetic algorithm, two tiny java files (serialized objects) are also written to the disk at regular intervals. They can be used to resume the calculation (method="r") if it was interrupted for any reason. This can also be used to continue a GA with modified parameters (e.g. mutation rate).

Details

glmulti is defined as a S4 function. It acts as a frontend that calls background compiled functions (contained if archive glmulti.jar). Running the function therefore requires a Java Running Environment, and package rJava. A thorough description of this function and package can be found in the article by Calcagno and de Mazancourt (see References). print.glmulti and summary.glmulti are S3 methods which provide a synthesis of glmulti analyses.

References

Buckland (1997) Model Selection: an Integral Part of Inference. Biometrics 10:41 Burnham & Anderson (2002) Model Selection and Multimodel Inference: an Information Theoretic Approach Calcagno & de Mazancourt 2010 J. Stat. Soft. v34 i12. See http://www.jstatsoft.org/v34/i12

Examples

Run this code

# A. This shows how to use a custom fitting function, taking the example of mixed models and lmer
# we load the lme4 package
library(lme4)
# some random data
vy = 1:100
va = runif(100)
vb = runif(100)
vc = factor(round(runif(100))) 
# assume we want to use a random effect to control for the factor vc
# There are three small steps to take:
# 1. we first define a wrapper to lmer
lmer.glmulti <- function (formula, data, random = "", ...) {
	lmer(paste(deparse(formula), random), data = data, ...)
}
# the fixed-effects are passed as formula, and the random effects are passed as "random"
# 2. We now define the corresponding getfit method (allowing access to fitted parameters)
setMethod('getfit', 'mer', function(object, ...) {
  summary(object)@coefs[,1:2]
})
# 3.Last, we must provide the corresponding aicc method, since the default will not work with mer objects
setMethod('aicc', 'mer', function(object, ...) {
	liliac<- logLik(object)
	k<-attr(liliac,"df")
	n= object@dims['n']
	return(-2*as.numeric(liliac[1]) + 2*k*n/max(n-k-1,0))
})
# we are now ready to go:
glmulti(vy~va*vb,level=2,fitfunc=lmer.glmulti,random="+(1|vc)")-> bab
plot(bab)
summary(bab)
weightable(bab)
coef(bab)
# fixed-effects are shuffled and the random part is constant


# B. This shows how to do the same for zero-inflated poisson models
# we load the required package
library(pscl)
# a random vector of count data
round(runif(100, 0,20)*round(runif(100)))-> vy2
# 1. The wrapper function
zeroinfl.glmulti=function(formula, data, inflate = "|1",...)  {
    zeroinfl(as.formula(paste(deparse(formula), inflate)),data=data,...)
} 
# 2 and 3. Unlike before, the default getfit and aicc method will work for zeroinfl objects, so no need to redefine them!
# we can proceed directly
glmulti(vy2~va*vb,fitfunc=zeroinfl.glmulti,inflate="|1")->bab




# C. This shows how to include some terms in ALL the models
# As above, we just prepare a wrapper of the fitting function
glm.redefined = function(formula, data, always="", ...) {
glm(as.formula(paste(deparse(formula), always)), data=data, ...)
}
# we then use this fitting function in glmulti
glmulti(vy~va,level=1,fitfunc=glm.redefined,always="+vb")-> bab
# va will be shuffled but vb is always included in the models

# this procedure allows support of arbitrarily any fitting function, or the use of sophisticated constraints on the model structure

Run the code above in your browser using DataLab