anova.gam: Hypothesis tests related to GAM fits

Description

Performs hypothesis tests relating to one or more fitted gam objects. For a single fitted gam object, Wald tests of the significance of each parametric and smooth term are performed. Otherwise the fitted models are compared using an analysis of deviance table. The tests are usually approximate, unless the models are un-penalized.

Usage

anova.gam(object, ..., dispersion = NULL, test = NULL)
print.anova.gam(x, digits = max(3, getOption("digits") - 3),...)

Arguments

object,...

fitted model objects of class gam as produced by gam().

an anova.gam object produced by a single model call to anova.gam().

dispersion

a value for the dispersion parameter: not normally used.

test

what sort of test to perform for a multi-model call. One of "Chisq", "F" or "Cp".

digits

number of digits to use when printing output.

Value

In the multi-model case anova.gam produces output identical to anova.glm, which it in fact uses.
In the single model case an object of class anova.gam is produced, which is in fact an object returned from summary.gam.
print.anova.gam simply produces tabulated output.

WARNING

P-values may be under-estimates, as a result of ignoring smoothing parameter uncertainty.

Details

If more than one fitted model is provided than anova.glm is used. If only one model is provided then the significance of each model term is assessed using Wald tests: see summary.gam for details of the actual computations. In the latter case print.anova.gam is used as the printing method.

P-values are usually reliable if the smoothing parameters are known, or the model is unpenalized. If smoothing parameters have been estimated then the p-values are typically somewhat too low. i.e. terms that appear `not significant' really are not, while terms that are significant, may in fact be non-significant if the p-value is close to whatever significance level you are choosing to operate at. This occurs because the uncertainty associated with the smoothing parameters is neglected in the calculations of the distributions under the null, which tends to lead to underdispersion in these distributions, and in turn to p-value estimates that are too low. (In simulations where the null is correct, I have seen p-values that are as low as half of what they should be.)

If it is important to have p-values that are as accurate as possible, then, at least in the single model case, it is probably advisable to perform tests using unpenalized smooths (i.e. s(...,fx=TRUE)) with the basis dimension, k, left at what would have been used with penalization. Such tests are not as powerful, of course, but the p-values are more accurate. Whether or not extra accuracy is required will usually depend on whether or not hypothesis testing is a key objective of the analysis.

Examples

Run this code

library(mgcv)
set.seed(0)
n<-200
sig<-2
x0 <- rep(1:4,50)
x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1)
x3 <- runif(n, 0, 1)
y <- 2 * x0
y <- y + exp(2 * x1)
y <- y + 0.2 * x2^11 * (10 * (1 - x2))^6 + 10 * (10 * x2)^3 * (1 - x2)^10
e <- rnorm(n, 0, sig)
y <- y + e
x0<-as.factor(x0)
b<-gam(y~x0+s(x1)+s(x2)+s(x3))
anova(b)
b1<-gam(y~x0+s(x1)+s(x2))
anova(b,b1,test="F")

Run the code above in your browser using DataLab