This function computes the model selection relative frequencies based on
the nonparametric bootstrap (Burnham and Anderson 2002). Models are
ranked based on the AIC, AICc, QAIC, or QAICc. The function currently
supports objects of aov
, betareg
, clm
, glm
,
hurdle
, lm
, multinom
, polr
, rlm
,
survreg
, vglm
, and zeroinfl
classes.
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
sort = TRUE, nsim = 100, ...)# S3 method for AICaov.lm
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
sort = TRUE, nsim = 100, ...)
# S3 method for AICsurvreg
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
sort = TRUE, nsim = 100, ...)
# S3 method for AICsclm.clm
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
sort = TRUE, nsim = 100, ...)
# S3 method for AICglm.lm
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
sort = TRUE, nsim = 100, c.hat = 1, ...)
# S3 method for AIChurdle
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
sort = TRUE, nsim = 100, ...)
# S3 method for AIClm
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
sort = TRUE, nsim = 100, ...)
# S3 method for AICmultinom.nnet
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
sort = TRUE, nsim = 100, c.hat = 1, ...)
# S3 method for AICpolr
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
sort = TRUE, nsim = 100, ...)
# S3 method for AICrlm.lm
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
sort = TRUE, nsim = 100, ...)
# S3 method for AICsurvreg
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
sort = TRUE, nsim = 100, ...)
# S3 method for AICvglm
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
sort = TRUE, nsim = 100, c.hat = 1, ...)
# S3 method for AICzeroinfl
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
sort = TRUE, nsim = 100, ...)
boot.wt
creates an object of class boot.wt
with the
following components:
the names of each model of the candidate model set.
the number of estimated parameters for each model.
the information criteria requested for each model (AICc, AICc, QAIC, QAICc).
the appropriate delta AIC component depending on the information criteria selected.
the relative likelihood of the model given the data (exp(-0.5*delta[i])). This is not to be confused with the likelihood of the parameters given the data. The relative likelihood can then be normalized across all models to get the model probabilities.
the Akaike weights, also termed "model probabilities" sensu Burnham and Anderson (2002) and Anderson (2008). These measures indicate the level of support (i.e., weight of evidence) in favor of any given model being the most parsimonious among the candidate model set.
the relative frequencies of model selection from the bootstrap.
if c.hat was specified as an argument, it is included in the table.
a list storing each of the models in the candidate model set.
a character vector of model names to facilitate the identification of
each model in the model selection table. If NULL
, the function
uses the names in the cand.set list of candidate models. If no names
appear in the list, generic names (e.g., Mod1
, Mod2
) are
supplied in the table in the same order as in the list of candidate
models.
logical. If TRUE
, the function returns the second-order
Akaike information criterion (i.e., AICc).
this argument allows to specify a numeric value other than total sample
size to compute the AICc (i.e., nobs
defaults to total number of
observations). This is relevant only for certain types of models such
as mixed models where sample size is not straightforward. In
such cases, one might use total number of observations or number of
independent clusters (e.g., sites) as the value of nobs
.
logical. If TRUE
, the model selection table is ranked according
to the (Q)AIC(c) values.
value of overdispersion parameter (i.e., variance inflation factor) such
as that obtained from c_hat
. Note that values of c.hat different
from 1 are only appropriate for binomial GLM's with trials > 1 (i.e.,
success/trial or cbind(success, failure) syntax) or with Poisson
GLM's. If c.hat > 1, boot.wt
will return the quasi-likelihood
analogue of the information criterion requested.
the number of bootstrap iterations. Burnham and Anderson (2002) recommend at least 1000 and up to 10 000 iterations for certain problems.
additional arguments passed to the function.
Marc J. Mazerolle
boot.wt
is implemented for aov
, betareg
,
glm
, hurdle
, lm
, multinom
, polr
,
rlm
, survreg
, vglm
, and zeroinfl
classes.
During each bootstrap iteration, the data are resampled with
replacement, all the models specified in cand.set
are updated
with the new data set, and the top-ranked model is saved. When all
iterations are completed, the relative frequency of selection is
computed for each model appearing in the candidate model set.
Relative frequencies of the models are often similar to Akaike
weights, and the latter are often preferred due to their link with
a Bayesian perspective (Burnham and Anderson 2002). boot.wt
is
most useful for teaching purposes of sampling-theory based relative
frequencies of model selection. The current implementation is only
appropriate with completely randomized designs. For more complex data
structures (e.g., blocks or random effects), the bootstrap should be
modified accordingly.
Anderson, D. R. (2008) Model-based Inference in the Life Sciences: a primer on evidence. Springer: New York.
Burnham, K. P., Anderson, D. R. (2002) Model Selection and Multimodel Inference: a practical information-theoretic approach. Second edition. Springer: New York.
Burnham, K. P., Anderson, D. R. (2004) Multimodel inference: understanding AIC and BIC in model selection. Sociological Methods and Research 33, 261--304.
Mazerolle, M. J. (2006) Improving data analysis in herpetology: using Akaike's Information Criterion (AIC) to assess the strength of biological hypotheses. Amphibia-Reptilia 27, 169--180.
AICc
, confset
, c_hat
,
evidence
, importance
, modavg
,
modavgShrink
, modavgPred
##Mazerolle (2006) frog water loss example
data(dry.frog)
##setup a subset of models of Table 1
Cand.models <- list( )
Cand.models[[1]] <- lm(log_Mass_lost ~ Shade + Substrate +
cent_Initial_mass + Initial_mass2,
data = dry.frog)
Cand.models[[2]] <- lm(log_Mass_lost ~ Shade + Substrate +
cent_Initial_mass + Initial_mass2 +
Shade:Substrate, data = dry.frog)
Cand.models[[3]] <- lm(log_Mass_lost ~ cent_Initial_mass +
Initial_mass2, data = dry.frog)
Cand.models[[4]] <- lm(log_Mass_lost ~ Shade + cent_Initial_mass +
Initial_mass2, data = dry.frog)
Cand.models[[5]] <- lm(log_Mass_lost ~ Substrate + cent_Initial_mass +
Initial_mass2, data = dry.frog)
##create a vector of names to trace back models in set
Modnames <- paste("mod", 1:length(Cand.models), sep = " ")
##generate AICc table with bootstrapped relative
##frequencies of model selection
boot.wt(cand.set = Cand.models, modnames = Modnames, sort = TRUE,
nsim = 10) #number of iterations should be much higher
##Burnham and Anderson (2002) flour beetle data
if (FALSE) {
data(beetle)
##models as suggested by Burnham and Anderson p. 198
Cand.set <- list( )
Cand.set[[1]] <- glm(Mortality_rate ~ Dose, family =
binomial(link = "logit"), weights = Number_tested,
data = beetle)
Cand.set[[2]] <- glm(Mortality_rate ~ Dose, family =
binomial(link = "probit"), weights = Number_tested,
data = beetle)
Cand.set[[3]] <- glm(Mortality_rate ~ Dose, family =
binomial(link ="cloglog"), weights = Number_tested,
data = beetle)
##create a vector of names to trace back models in set
Modnames <- paste("Mod", 1:length(Cand.set), sep = " ")
##model selection table with bootstrapped
##relative frequencies
boot.wt(cand.set = Cand.set, modnames = Modnames)
}
Run the code above in your browser using DataLab