anovaPB: Parametric bootstrap testing to compare two models by analysis of variance (deviance)

Description

Compute analysis of variance (or deviance) tables for two fitted model objects, with the P-value estimated using a parametric bootstrap -- by repeatedly simulating new responses from the fitted model under the null hypothesis.

Usage

anovaPB(
  objectNull,
  object,
  n.sim = 999,
  colRef = switch(class(object)[1], lm = 5, lmerMod = 6, 4),
  rowRef = 2,
  ncpus = NULL,
  ...
)

Value

A matrix which has the appearance and behaviour of an object returned by anova(objectNull,object), except that the P-value has been computed by parametric bootstrap, and the matrix now inherits class anovaPB.

Arguments

objectNull: is the fitted model under the null hypothesis. This can be any object that responds to simulate, update and anova.
object: is the fitted model under the alternative hypothesis. This can be any object that responds to update, anova and model.frame. It must be larger than objectNull and its model frame should contain all the terms required to fit objectNull.
n.sim: the number of simulated datasets to generate for parametric bootstrap testing. Defaults to 999.
colRef: the column number where the test statistic of interest can be found in the standard anova output when calling anova(objectNull,object). Default choices have been set for common models (colRef=5 for lm objects, colRef=6 for lmer objects and colRef=4 otherwise, which is correct for glm and gam objects).
rowRef: the row number where the test statistic of interest can be found in the standard anova output when calling anova(objectNull,object). Defaults to 2, because it is on the second row for most common models.
ncpus: the number of CPUs to use. Default (NULL) uses two less than the total available.
...: further arguments sent through to anova.

Author

David Warton <david.warton@unsw.edu.au>

Details

The anova function typically relies on classical asymptotic results which sometimes don't apply well, e.g. when using penalised likelihood to fit a generalised additive model, or when testing whether a random effect should be included in a mixed model. In such cases we can get a more accurate test by using simulation to estimate the P-value -- repeatedly simulating new sets of simulated responses, refitting the null and alternative models, and recomputing the test statistic. This process allows us to estimate the distribution of the test statistic when the null hypothesis is true, hence draw a conclusion about whether the observed test statistic is large relative to what would be expected under the null hypothesis. The process is often referred to as a parametric bootstrap (Davison & Hinkley 1997), which the PB in the function name (anovaPB) is referring to.

This function will take any pair of fitted models, a null (objectNull) and an alternative (object), and use simulation to estimate the P-value of the test of whether there is evidence against the model in objectNull in favour of the model in object. This function works by repeatedly performing an anova to compare objectNull to object, where the responses in the model.frame have been replaced with values simulated by applying simulate to objectNull. Hence for this function to work, the objects being compared need to respond to the anova, simulate and model.frame functions.

This function needs to be able to find the test statistic in the anova output, but it is stored in different places for different types of objects. It is assumed that anova(objectNull,object) returns a matrix and that the test statistic is stored in the (rowRef,colRef)th element, where both rowRef and colRef have been chosen to be correct for common model types (glm, gam, lmer).

References

Davison A.C. & Hinkley D. V. (1997) Bootstrap methods and their application, Cambridge University Press. Warton DI (2022) Eco-Stats - Data Analysis in Ecology, from t-tests to multivariate abundances. Springer, ISBN 978-3-030-88442-0

Examples

Run this code

# fit a Poisson regression to random data:
y = rpois(50,lambda=1)
x = 1:50
rpois_glm = glm(y~x, family=poisson())
rpois_int = glm(y~1, family=poisson())
anovaPB(rpois_int, rpois_glm, n.sim=99, ncpus=1)
# this approach would run faster on some larger problems (but maybe not this one!):
if (FALSE) anovaPB(rpois_int, rpois_glm, n.sim=99, ncpus=4)

Run the code above in your browser using DataLab