Compute analysis of variance (or deviance) tables for two fitted model objects, with the P-value estimated using a parametric bootstrap -- by repeatedly simulating new responses from the fitted model under the null hypothesis.
anovaPB(
objectNull,
object,
n.sim = 999,
colRef = switch(class(object)[1], lm = 5, lmerMod = 6, 4),
rowRef = 2,
ncpus = NULL,
...
)
A matrix which has the appearance and behaviour of an object returned by
anova(objectNull,object)
, except that the P-value has been computed
by parametric bootstrap, and the matrix now inherits class anovaPB
.
is the fitted model under the null hypothesis. This can be
any object that responds to simulate
, update
and anova
.
is the fitted model under the alternative hypothesis. This can be
any object that responds to update
, anova
and model.frame
.
It must be larger than objectNull
and its model frame should contain
all the terms required to fit objectNull
.
the number of simulated datasets to generate for parametric bootstrap testing. Defaults to 999.
the column number where the test statistic of interest can be found
in the standard anova
output when calling anova(objectNull,object)
.
Default choices have been set for common models (colRef=5
for lm
objects,
colRef=6
for lmer
objects and colRef=4
otherwise, which is correct
for glm
and gam
objects).
the row number where the test statistic of interest can be found
in the standard anova
output when calling anova(objectNull,object)
.
Defaults to 2
, because it is on the second row for most common models.
the number of CPUs to use. Default (NULL) uses two less than the total available.
further arguments sent through to anova
.
David Warton <david.warton@unsw.edu.au>
The anova
function typically relies on classical asymptotic results
which sometimes don't apply well, e.g. when using penalised likelihood to fit a
generalised additive model, or when testing whether a random effect should be
included in a mixed model. In such cases we can get a more accurate test by
using simulation to estimate the P-value -- repeatedly simulating
new sets of simulated responses, refitting the null and alternative models, and
recomputing the test statistic. This process allows us to estimate the
distribution of the test statistic when the null hypothesis is true, hence
draw a conclusion about whether the observed test statistic is large relative
to what would be expected under the null hypothesis. The process is often
referred to as a parametric bootstrap (Davison & Hinkley 1997), which
the PB in the function name (anovaPB
) is referring to.
This function will take any pair of fitted models, a null (objectNull
)
and an alternative (object
), and use simulation to estimate the
P-value of the test of whether there is evidence against the model in
objectNull
in favour of the model in object
. This function works
by repeatedly performing an anova
to compare objectNull
to
object
, where the responses in the model.frame
have been replaced
with values simulated by applying simulate
to objectNull
. Hence
for this function to work, the objects being compared need to respond
to the anova
, simulate
and model.frame
functions.
This function needs to be able to find the test statistic in the anova
output, but it is stored in different places for different types of objects.
It is assumed that anova(objectNull,object)
returns a matrix and that the
test statistic is stored in the (rowRef,colRef)
th element, where both
rowRef
and colRef
have been chosen to be correct for common
model types (glm
, gam
, lmer
).
Davison A.C. & Hinkley D. V. (1997) Bootstrap methods and their application, Cambridge University Press. Warton DI (2022) Eco-Stats - Data Analysis in Ecology, from t-tests to multivariate abundances. Springer, ISBN 978-3-030-88442-0
# fit a Poisson regression to random data:
y = rpois(50,lambda=1)
x = 1:50
rpois_glm = glm(y~x, family=poisson())
rpois_int = glm(y~1, family=poisson())
anovaPB(rpois_int, rpois_glm, n.sim=99, ncpus=1)
# this approach would run faster on some larger problems (but maybe not this one!):
if (FALSE) anovaPB(rpois_int, rpois_glm, n.sim=99, ncpus=4)
Run the code above in your browser using DataLab