testDispersion: DHARMa dispersion tests

Description

This function performs simulation-based tests for over/underdispersion. If type = "DHARMa" (default and recommended), simulation-based dispersion tests are performed. Their behavior differs depending on whether simulations are done with refit = F, or refit = T, and whether data is simulated conditional (e.g. re.form ~0 in lme4) (see below). If type = "PearsonChisq", a chi2 test on Pearson residuals is performed.

Usage

testDispersion(simulationOutput, alternative = c("two.sided", "greater",
  "less"), plot = T, type = c("DHARMa", "PearsonChisq"), ...)

Arguments

simulationOutput

an object of class DHARMa, either created via simulateResiduals for supported models or by createDHARMa for simulations created outside DHARMa, or a supported model. Providing a supported model directly is discouraged, because simulation settings cannot be changed in this case.

alternative

a character string specifying whether the test should test if observations are "greater", "less" or "two.sided" compared to the simulated null hypothesis. Greater corresponds to testing only for overdispersion. It is recommended to keep the default setting (testing for both over and underdispersion)

plot

whether to provide a plot for the results

type

which test to run. Default is DHARMa, other options are PearsonChisq (see details)

...

arguments to pass on to testGeneric

Details

Over / underdispersion means that the observed data is more / less dispersed than expected under the fitted model. There are a number of common ways to test for dispersion problems, from the classical dispersion/df idea used for GLMs over other tests implemented in various R packages. This function implements several dispersion tests.

type == "DHARMa"

If type = "DHARMa" (default and recommended), simulation-based dispersion tests are performed. Their behavior differs depending on whether simulations are done with refit = F, or refit = T, and whether data is simulated conditional (e.g. re.form ~0 in lme4)

If refit = F, the function uses testGeneric to compare the variance of the observed residuals against the variance of the simulated residuals via their ratios. The test returns the ratio of observed vs. mean simulated variance, together with a p-value based on the distribution of the simulated sds. A significant ratio > 1 indicates overdispersion, a significant ratio < 1 underdispersion.

If refit = T, the function compares the approximate deviance (via squared pearson residuals) with the same quantity from the models refitted with simulated data. Applying this is much slower than the previous alternative. Given the computational cost, I would suggest that most users will be satisfied with the standard dispersion test.

Moreover, for either refit = T or F, the results of the DHARMa dispersion test will differ depending on whether simulations are done conditional (= conditional on fitted random effects) or unconditional (= REs are re-simulated). You can change between conditional or unconditional simulations in simulateResiduals if this is supported by the regression package that you use (depends on the package). The default in DHARMa is to use unconditional simulations (for other reasons), but conditional simulations are often more sensitive to dispersion problems in the presence of substantial RE variance, and I recommend checking dispersion with conditional simulations if supported by the used regression package.

type == "PearsonChisq"

This is the test described in https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#overdispersion, identical to performance::check_overdispersion. Works only if the fitted model provides df.residual and Pearson residuals.

The test statistics is biased to lower values under quite general conditions, and will therefore tend to test significant for underdispersion. It is recommended to use this test only for overdispersion, i.e. use alternative == "greater"

Examples

Run this code

# NOT RUN {
library(lme4)
set.seed(123)

testData = createData(sampleSize = 100, overdispersion = 0.5, randomEffectVariance = 1)
fittedModel <- glmer(observedResponse ~ Environment1 + (1|group), 
                     family = "poisson", data = testData)
simulationOutput <- simulateResiduals(fittedModel = fittedModel)

# default DHARMa dispersion test - simulation-based 
testDispersion(simulationOutput) 
testDispersion(simulationOutput, alternative = "less", plot = FALSE) # only underdispersion
testDispersion(simulationOutput, alternative = "greater", plot = FALSE) # only oversispersion

# for mixed models, the test is usually more powerful if residuals are calculated 
# conditional on fitted REs
simulationOutput <- simulateResiduals(fittedModel = fittedModel, re.form = NULL)
testDispersion(simulationOutput)

# DHARMa also implements the popular Pearson-chisq test that is also on the glmmWiki by Ben Bolker
# The issue with this test is that it requires the df of the model, which are not well defined
# for GLMMs. It is biased towards underdispersion, with bias getting larger with the number 
# of RE groups. In doubt, only test for overdispersion
testDispersion(simulationOutput, type = "PearsonChisq", alternative = "greater")

# if refit = T, a different test on simulated Pearson residuals will calculated (see help)
simulationOutput2 <- simulateResiduals(fittedModel = fittedModel, refit = TRUE, seed = 12, n = 20)
testDispersion(simulationOutput2)

# often useful to test dispersion per group (in particular for binomial data, see vignette)
simulationOutputAggregated = recalculateResiduals(simulationOutput2, group = testData$group)
testDispersion(simulationOutputAggregated)




# }

Run the code above in your browser using DataLab

State of Data and AI Literacy Report 2025