boot.wt: Compute Model Selection Relative Frequencies

Description

This function computes the model selection relative frequencies based on the nonparametric bootstrap (Burnham and Anderson 2002). Models are ranked based on the AIC, AICc, QAIC, or QAICc. The function currently supports objects of aov, betareg, clm, glm, hurdle, lm, multinom, polr, rlm, survreg, vglm, and zeroinfl classes.

Usage

boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
        sort = TRUE, nsim = 100, ...)
# S3 method for AICaov.lm
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
       sort = TRUE, nsim = 100, ...)
# S3 method for AICsurvreg
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
       sort = TRUE, nsim = 100, ...)
# S3 method for AICsclm.clm
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
       sort = TRUE, nsim = 100, ...)
# S3 method for AICglm.lm
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
       sort = TRUE, nsim = 100, c.hat = 1, ...)
# S3 method for AIChurdle
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
       sort = TRUE, nsim = 100, ...)
       
# S3 method for AIClm
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
       sort = TRUE, nsim = 100, ...)
# S3 method for AICmultinom.nnet
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
       sort = TRUE, nsim = 100, c.hat = 1, ...)
# S3 method for AICpolr
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
       sort = TRUE, nsim = 100, ...)
# S3 method for AICrlm.lm
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
       sort = TRUE, nsim = 100, ...)
# S3 method for AICsurvreg
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
       sort = TRUE, nsim = 100, ...)
# S3 method for AICvglm
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
       sort = TRUE, nsim = 100, c.hat = 1, ...)
# S3 method for AICzeroinfl
boot.wt(cand.set, modnames = NULL, second.ord = TRUE, nobs = NULL,
       sort = TRUE, nsim = 100, ...)

Value

boot.wt creates an object of class boot.wt with the following components:

Modname: the names of each model of the candidate model set.
K: the number of estimated parameters for each model.
(Q)AIC(c): the information criteria requested for each model (AICc, AICc, QAIC, QAICc).
Delta_(Q)AIC(c): the appropriate delta AIC component depending on the information criteria selected.
ModelLik: the relative likelihood of the model given the data (exp(-0.5*delta[i])). This is not to be confused with the likelihood of the parameters given the data. The relative likelihood can then be normalized across all models to get the model probabilities.
(Q)AIC(c)Wt: the Akaike weights, also termed "model probabilities" sensu Burnham and Anderson (2002) and Anderson (2008). These measures indicate the level of support (i.e., weight of evidence) in favor of any given model being the most parsimonious among the candidate model set.
PiWt: the relative frequencies of model selection from the bootstrap.
c.hat: if c.hat was specified as an argument, it is included in the table.

Arguments

cand.set: a list storing each of the models in the candidate model set.
modnames: a character vector of model names to facilitate the identification of each model in the model selection table. If NULL, the function uses the names in the cand.set list of candidate models. If no names appear in the list, generic names (e.g., Mod1, Mod2) are supplied in the table in the same order as in the list of candidate models.
second.ord: logical. If TRUE, the function returns the second-order Akaike information criterion (i.e., AICc).
nobs: this argument allows to specify a numeric value other than total sample size to compute the AICc (i.e., nobs defaults to total number of observations). This is relevant only for certain types of models such as mixed models where sample size is not straightforward. In such cases, one might use total number of observations or number of independent clusters (e.g., sites) as the value of nobs.
sort: logical. If TRUE, the model selection table is ranked according to the (Q)AIC(c) values.
c.hat: value of overdispersion parameter (i.e., variance inflation factor) such as that obtained from c_hat. Note that values of c.hat different from 1 are only appropriate for binomial GLM's with trials > 1 (i.e., success/trial or cbind(success, failure) syntax) or with Poisson GLM's. If c.hat > 1, boot.wt will return the quasi-likelihood analogue of the information criterion requested.
nsim: the number of bootstrap iterations. Burnham and Anderson (2002) recommend at least 1000 and up to 10 000 iterations for certain problems.
...: additional arguments passed to the function.

Author

Marc J. Mazerolle

Details

boot.wt is implemented for aov, betareg, glm, hurdle, lm, multinom, polr, rlm, survreg, vglm, and zeroinfl classes. During each bootstrap iteration, the data are resampled with replacement, all the models specified in cand.set are updated with the new data set, and the top-ranked model is saved. When all iterations are completed, the relative frequency of selection is computed for each model appearing in the candidate model set.

Relative frequencies of the models are often similar to Akaike weights, and the latter are often preferred due to their link with a Bayesian perspective (Burnham and Anderson 2002). boot.wt is most useful for teaching purposes of sampling-theory based relative frequencies of model selection. The current implementation is only appropriate with completely randomized designs. For more complex data structures (e.g., blocks or random effects), the bootstrap should be modified accordingly.

References

Anderson, D. R. (2008) Model-based Inference in the Life Sciences: a primer on evidence. Springer: New York.

Burnham, K. P., Anderson, D. R. (2002) Model Selection and Multimodel Inference: a practical information-theoretic approach. Second edition. Springer: New York.

Burnham, K. P., Anderson, D. R. (2004) Multimodel inference: understanding AIC and BIC in model selection. Sociological Methods and Research 33, 261--304.

Mazerolle, M. J. (2006) Improving data analysis in herpetology: using Akaike's Information Criterion (AIC) to assess the strength of biological hypotheses. Amphibia-Reptilia 27, 169--180.

Examples

Run this code

##Mazerolle (2006) frog water loss example
data(dry.frog)

##setup a subset of models of Table 1
Cand.models <- list( )
Cand.models[[1]] <- lm(log_Mass_lost ~ Shade + Substrate +
                       cent_Initial_mass + Initial_mass2,
                       data = dry.frog)
Cand.models[[2]] <- lm(log_Mass_lost ~ Shade + Substrate +
                       cent_Initial_mass + Initial_mass2 +
                       Shade:Substrate, data = dry.frog)
Cand.models[[3]] <- lm(log_Mass_lost ~ cent_Initial_mass +
                       Initial_mass2, data = dry.frog)
Cand.models[[4]] <- lm(log_Mass_lost ~ Shade + cent_Initial_mass +
                       Initial_mass2, data = dry.frog)
Cand.models[[5]] <- lm(log_Mass_lost ~ Substrate + cent_Initial_mass +
                       Initial_mass2, data = dry.frog)

##create a vector of names to trace back models in set
Modnames <- paste("mod", 1:length(Cand.models), sep = " ")

##generate AICc table with bootstrapped relative
##frequencies of model selection
boot.wt(cand.set = Cand.models, modnames = Modnames, sort = TRUE,
        nsim = 10) #number of iterations should be much higher


##Burnham and Anderson (2002) flour beetle data
if (FALSE) {
data(beetle)
##models as suggested by Burnham and Anderson p. 198          
Cand.set <- list( )
Cand.set[[1]] <- glm(Mortality_rate ~ Dose, family =
                     binomial(link = "logit"), weights = Number_tested,
                     data = beetle)
Cand.set[[2]] <- glm(Mortality_rate ~ Dose, family =
                     binomial(link = "probit"), weights = Number_tested,
                     data = beetle)
Cand.set[[3]] <- glm(Mortality_rate ~ Dose, family =
                     binomial(link ="cloglog"), weights = Number_tested,
                     data = beetle)

##create a vector of names to trace back models in set
Modnames <- paste("Mod", 1:length(Cand.set), sep = " ")

##model selection table with bootstrapped
##relative frequencies
boot.wt(cand.set = Cand.set, modnames = Modnames)
}

Run the code above in your browser using DataLab