get.measures(y, X = NULL, family, trial.size = NULL, site.eff, num.lv, fit.mcmc)
NULL
, in which case it is assumed no model matrix was used.boral
with save.model = TRUE
, and then applying as.mcmc
on calc.condlogLik
.site.eff = FALSE
) or in terms of species composition (site.eff = TRUE
).1) WAIC has been argued to be more natural and extension of AIC to the Bayesian and hierarhical modelling context (Gelman et al., 2013), and is based on the conditional log-likelihood calculated at each of the MCMC samples.
2 & 3) EAIC and EBIC were suggested by (Carlin and Louis, 2011). Both criteria are of the form -2*mean(conditional log-likelihood) + penalty*(no. of parameters in the model), where the mean is averaged all the MCMC samples. EAIC applies a penalty of 2, while EBIC applies a penalty of $log(n)$.
4 & 5) AIC and BIC take the form -2*(marginal log-likelihood) + penalty*(no. of parameters in the model), where the log-likelihood is evaluated at the posterior median. If the parameter-wise posterior distributions are unimodal and approximately symmetric, these will produce similar results to an AIC and BIC where the log-likelihood is evaluated at the posterior mode. EAIC applies a penalty of 2, while EBIC applies a penalty of $log(n)$.
6) The model likelihood is the probability of the data given a model, and both BIC and the compound Laplace-Metropolis estimator are based on asymptotic approximations to the this. However, while the first term in both criteria are of the same form, namely -2*(marginal log-likelihood), where the log-likelihood is evaluated at the posterior median, the compound Laplace-Metropolis estimator explicitly calculates the determinant of the relevant hessian matrix (evalulated at the posterior median) is to derive its penalty.
In our very limited experience, if information criteria are to be used for model selection between boral models, we found BIC at the posterior median and the compound Laplace-Metrpolis estimator tend to perform best. WAIC, AIC, and DIC (see get.dic
) tend to over select the number of latent variables. For WAIC and DIC, part of this overfitting could be due to the fact both criteria are calculated from the conditional rather than the marginal log-likelihood (see Millar, 2009).
Intuitively, comparing boral models with and without latent variables (using information criteria such as those returned) amounts to testing whether the columns of the response matrix $y$ are correlated. With multivariate abundance data for example, where $y$ is a matrix of $n$ sites and $p$ species, comparing models with and without latent variables tests whether there is any evidence of correlation between species.
get.dic
for calculating the Deviance Information Criterion (DIC) based on the conditional log-likelihood; get.more.measures
for even more information criteria.library(mvabund) ## Load a dataset from the mvabund package
data(spider)
y <- spider$abun
n <- nrow(y); p <- ncol(y);
spider.fit.pois <- boral(y, family = "poisson", num.lv = 2,
site.eff = TRUE, save.model = FALSE, calc.ics = TRUE)
spider.fit.pois$ics ## Returns information criteria
spider.fit.nb <- boral(y, family = "negative.binomial", num.lv = 2,
site.eff = TRUE, save.model = FALSE, calc.ics = TRUE)
spider.fit.nb$ics ## Returns the information criteria
Run the code above in your browser using DataLab