gam
object produced by gam()
and produces various useful
summaries from it. (See sink
to divert output to a file.)## S3 method for class 'gam':
summary(object, dispersion=NULL, freq=FALSE, p.type = 0, ...)## S3 method for class 'summary.gam':
print(x,digits = max(3, getOption("digits") - 3),
signif.stars = getOption("show.signif.stars"),...)
gam
object as produced by gam()
.summary.gam
object produced by summary.gam()
.NULL
to use estimate or
default (e.g. 1 for Poisson).TRUE
then
the frequentist covariance matrix of the parameters is used instead.summary.gam
produces a list of summary information for a fitted gam
object.p.coeff
's divided by their standard errors.r.sq
does not include any offset in the one parameter model.dev.expl
can be substantially lower than r.sq
when an offset is present.freq=TRUE
), divided
by scale parameter.freq=TRUE
).P-values for terms penalized via `paraPen' are unlikely to be correct.
print.summary.gam
tries to print various bits of summary information useful for term selection in a pretty way.
P-values for smooth terms are usually based on a test statistic motivated by an extension of Nychka's (1988) analysis of the frequentist properties of Bayesian confidence intervals for smooths (Marra and Wood, 2012). These have better frequentist performance (in terms of power and distribution under the null) than the alternative strictly frequentist approximation. When the Bayesian intervals have good across the function properties then the p-values have close to the correct null distribution and reasonable power (but there are no optimality results for the power). Full details are in Wood (2013b), although what is computed is actually a slight variant in which the components of the test statistic are weighted by the iterative fitting weights.
Note that for terms with no unpenalized terms (such as Gaussian random effects) the Nychka (1988) requirement for smoothing bias to be substantially less than variance breaks down (see e.g. appendix of Marra and Wood, 2012), and this results in incorrect null distribution for p-values computed using the above approach. In this case it is necessary to use an alternative approach designed for random effects variance components, and this is done. See Wood (2013a) for details: the test is based on a likelihood ratio statistic (with the reference distribution appropriate for the null hypothesis on the boundary of the parameter space).
All p-values are computed without considering uncertainty in the smoothing parameter estimates.
In simulations the p-values have best behaviour under ML smoothness selection, with REML coming second. In general the p-values behave well, but neglecting smoothing parameter uncertainty means that they may be somewhat too low when smoothing parameters are highly uncertain. High uncertainty happens in particular when smoothing parameters are poorly identified, which can occur with nested smooths or highly correlated covariates (high concurvity).
If p.type=5
then the frequentist approximation for p-values of smooth terms described in section
4.8.5 of Wood (2006) is used. The approximation is not as good as the default, and is no longer recommended.
By default the p-values for parametric model terms are also based on Wald tests using the Bayesian
covariance matrix for the coefficients. This is appropriate when there are "re" terms present, and is
otherwise rather similar to the results using the frequentist covariance matrix (freq=TRUE
), since
the parametric terms themselves are usually unpenalized. Default P-values for parameteric terms that are
penalized using the paraPen
argument will not be good. However if such terms represent conventional
random effects with full rank penalties, then setting freq=TRUE
is appropriate.
Nychka (1988) Bayesian Confidence Intervals for Smoothing Splines. Journal of the American Statistical Association 83:1134-1143.
Wood, S.N. (2013a) A simple test for random effects in regression models. Biometrika 100:1005-1010
Wood, S.N. (2013b) On p-values for smooth components of an extended generalized additive model. Biometrika 100:221-228
Wood S.N. (2006) Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC Press.
gam
, predict.gam
,
gam.check
, anova.gam
, gam.vcomp
, sp.vcov
library(mgcv)
set.seed(0)
dat <- gamSim(1,n=200,scale=2) ## simulate data
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
plot(b,pages=1)
summary(b)
## now check the p-values by using a pure regression spline.....
b.d <- round(summary(b)$edf)+1 ## get edf per smooth
b.d <- pmax(b.d,3) # can't have basis dimension less than 3!
bc<-gam(y~s(x0,k=b.d[1],fx=TRUE)+s(x1,k=b.d[2],fx=TRUE)+
s(x2,k=b.d[3],fx=TRUE)+s(x3,k=b.d[4],fx=TRUE),data=dat)
plot(bc,pages=1)
summary(bc)
## Example where some p-values are less reliable...
dat <- gamSim(6,n=200,scale=2)
b <- gam(y~s(x0,m=1)+s(x1)+s(x2)+s(x3)+s(fac,bs="re"),data=dat)
## Here s(x0,m=1) can be penalized to zero, so p-value approximation
## cruder than usual...
summary(b)
## p-value check - increase k to make this useful!
k<-20;n <- 200;p <- rep(NA,k)
for (i in 1:k)
{ b<-gam(y~te(x,z),data=data.frame(y=rnorm(n),x=runif(n),z=runif(n)),
method="ML")
p[i]<-summary(b)$s.p[1]
}
plot(((1:k)-0.5)/k,sort(p))
abline(0,1,col=2)
ks.test(p,"punif") ## how close to uniform are the p-values?
Run the code above in your browser using DataLab