testOutliers: Test for outliers

Description

This function tests if the number of observations that are strictly greater / smaller than all simulations are larger than expected

Usage

testOutliers(simulationOutput, alternative = c("two.sided", "greater",
  "less"), plot = T)

Arguments

simulationOutput

an object of class DHARMa with simulated quantile residuals, either created via simulateResiduals or by createDHARMa for simulations created outside DHARMa

alternative

a character string specifying whether the test should test if observations are "greater", "less" or "two.sided" compared to the simulated null hypothesis

plot

if T, the function will create an additional plot

Details

DHARMa residuals are created by simulating from the fitted model, and comparing the simulated values to the observed data. It can occur that all simulated values are higher or smaller than the observed data, in which case they get the residual value of 0 and 1, respectively. I refer to these values as simulation outliers, or simply outliers.

Because no data was simulated in the range of the observed value, we actually don't know "how much" these values deviate from the model expecation, so the term "outlier" should be used with a grain of salt - it's not a judgement about the probability of a deviation from an expectation, but denotes that we are outside the simulated range. The number of outliers would usually decrease if the number of DHARMa simulations is increased.

The probability of an outlier depends on the number of simulations (in fact, it is 1/(nSim +1) for each side), so whether the existence of outliers is a reason for concern depends also on the number of simulations. The expected number of outliers is therefore binomially distributed, and we can calculate a p-value from that

Examples

Run this code

# NOT RUN {
testData = createData(sampleSize = 200, overdispersion = 0.5, randomEffectVariance = 0)
fittedModel <- glm(observedResponse ~ Environment1 , family = "poisson", data = testData)
simulationOutput <- simulateResiduals(fittedModel = fittedModel)

# the plot function runs 4 tests
# i) KS test i) Dispersion test iii) Outlier test iv) quantile test
plot(simulationOutput, quantreg = TRUE)

# testResiduals tests distribution, dispersion and outliers
testResiduals(simulationOutput)

####### Individual tests #######

# KS test for correct distribution of residuals
testUniformity(simulationOutput)

# Dispersion test
testDispersion(simulationOutput) # tests under and overdispersion
testDispersion(simulationOutput, alternative = "less") # only underdispersion
testDispersion(simulationOutput, alternative = "less") # only underdispersion

# if model is refitted, a different test will be called
simulationOutput2 <- simulateResiduals(fittedModel = fittedModel, refit = TRUE, seed = 12)
testDispersion(simulationOutput2)

# often useful to test dispersion per group (e.g. binomial data, see vignette)
simulationOutput3 = recalculateResiduals(simulationOutput, group = testData$group)
testDispersion(simulationOutput3)

# Outlier test (number of observations outside simulation envelope)
testOutliers(simulationOutput) 

# testing zero inflation
testZeroInflation(simulationOutput)

# testing generic summaries
countOnes <- function(x) sum(x == 1)  # testing for number of 1s
testGeneric(simulationOutput, summary = countOnes) # 1-inflation
testGeneric(simulationOutput, summary = countOnes, alternative = "less") # 1-deficit

means <- function(x) mean(x) # testing if mean prediction fits
testGeneric(simulationOutput, summary = means) 

spread <- function(x) sd(x) # testing if mean sd fits
testGeneric(simulationOutput, summary = spread) 




# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Details

See Also

Examples