summary.gam: Summary for a GAM fit

Description

Takes a fitted gam object produced by gam() and produces various useful summaries from it.

Usage

summary.gam(object, dispersion=NULL, freq=TRUE, ...)
print.summary.gam(x,digits = max(3, getOption("digits") - 3), 
                  signif.stars = getOption("show.signif.stars"),...)

Arguments

object

a fitted gam object as produced by gam().

a summary.gam object produced by summary.gam().

dispersion

A known dispersion parameter. NULL to use estimate or default (e.g. 1 for Poisson).

freq

By default p-values for individual terms are calculated using the frequentist estimated covariance matrix of the parameter estimators. If this is set to FALSE then the Bayesian covariance matrix of the parameters is used instead. See details.

digits

controls number of digits printed in output.

signif.stars

Should significance stars be printed alongside output.

...

other arguments.

Value

summary.gam produces a list of summary information for a fitted gam object.
p.coeffis an array of estimates of the strictly parametric model coefficients.
p.tis an array of the p.coeff's divided by their standard errors.
p.pvis an array of p-values for the null hypothesis that the corresponding parameter is zero. Calculated with reference to the t distribution with the estimated residual degrees of freedom for the model fit if the dispersion parameter has been estimated, and the standard normal if not.
mThe number of smooth terms in the model.
chi.sqAn array of test statistics for assessing the significance of model smooth terms. If ${\bf p}_i$ is the parameter vector for the ith smooth term, and this term has estimated covariance matrix ${\bf V}_i$ then the statistic is ${\bf p}_i^\prime {\bf V}_i^{k-} {\bf p}_i$, where ${\bf V}^{k-}_i$ is the rank k pseudo-inverse of ${\bf V_i}$, and k is estimated rank of ${\bf V_i}$.
s.pvAn array of approximate p-values for the null hypotheses that each smooth term is zero. Be warned, these are only approximate. In the case of known dispersion parameter, they are obtained by comparing the chi.sq statistic given above to the chi-squared distribution with k degrees of freedom, where k is the estimated rank of ${\bf V_i}$. If the dispersion parameter is unknown (in which case it will have been estimated) the statistic is compared to an F distribution with k upper d.f. and lower d.f. given by the residual degrees of freedom for the model . Typically the p-values will be somewhat too low, because they are conditional on the smoothing parameters, which are usually uncertain.
searray of standard error estimates for all parameter estimates.
r.sqThe adjusted r-squared for the model. Defined as the proportion of variance explained, where original variance and residual variance are both estimated using unbiased estimators. This quantity can be negative if your model is worse than a one parameter constant model, and can be higher for the smaller of two nested models! Note that proportion null deviance explained is probably more appropriate for non-normal errors.
dev.explThe proportion of the null deviance explained by the model.
edfarray of estimated degrees of freedom for the model terms.
residual.dfestimated residual degrees of freedom.
nnumber of data.
gcvminimized GCV score for the model, if GCV used.
ubreminimized UBRE score for the model, if UBRE used.
scaleestimated (or given) scale parameter.
familythe family used.
formulathe original GAM formula.
dispersionthe scale parameter.
pTerms.dfthe degrees of freedom associated with each parameteric term (excluding the constant).
pTerms.chi.sqa Wald statistic for testing the null hypothesis that the each parametric term is zero.
pTerms.pvp-values associated with the tests that each term is zero. For penalized fits these are approximate, being conditional on the smoothing parameters. The reference distribution is an appropriate chi-squared when the scale parameter is known, and is based on an F when it is not.
cov.unscaledThe estimated covariance matrix of the parameters (or estimators if freq=TRUE), divided by scale parameter.
cov.scaledThe estimated covariance matrix of the parameters (estimators if freq=TRUE).
p.tablesignificance table for parameters
s.tablesignificance table for smooths
p.Termssignificance table for parametric model terms

WARNING

The supplied p-values will often be underestimates if smoothing parameters have been estimated as part of model fitting.

Details

Model degrees of freedom are taken as the trace of the influence (or hat) matrix ${\bf A}$ for the model fit. Residual degrees of freedom are taken as number of data minus model degrees of freedom. Let ${\bf P}_i$ be the matrix giving the parameters of the ith smooth when applied to the data (or pseudodata in the generalized case) and let ${\bf X}$ be the design matrix of the model. Then $tr({\bf XP}_i )$ is the edf for the ith term. Clearly this definition causes the edf's to add up properly!

print.summary.gam tries to print various bits of summary information useful for term selection in a pretty way.

If freq=FALSE then the Bayesian parameter covariance matrix, object$Vp, is used to calculate test statistics for terms, and the degrees of freedom for reference distributions is taken as the estimated degrees of freedom for the term concerned. This is not easy to justify theoretically, and the resulting `Bayesian p-values' are difficult to interpret and often have much worse frequentist performance than the default p-values.

References

Gu and Wahba (1991) Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM J. Sci. Statist. Comput. 12:383-398

Wood, S.N. (2000) Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties. J.R.Statist.Soc.B 62(2):413-428

Wood, S.N. (2003) Thin plate regression splines. J.R.Statist.Soc.B 65(1):95-114

Wood, S.N. (2004a) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Ass. 99:673-686

Wood, S.N. (2004b) On confidence intervals for GAMs based on penalized regression splines. Technical Report 04-12 Department of Statistics, University of Glasgow.

http://www.stats.gla.ac.uk/~simon/

Examples

Run this code

library(mgcv)
set.seed(0)
n<-200
sig2<-4
x0 <- runif(n, 0, 1)
x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1)
x3 <- runif(n, 0, 1)
pi <- asin(1) * 2
y <- 2 * sin(pi * x0)
y <- y + exp(2 * x1) - 3.75887
y <- y + 0.2 * x2^11 * (10 * (1 - x2))^6 + 10 * (10 * x2)^3 * (1 - x2)^10 - 1.396
e <- rnorm(n, 0, sqrt(abs(sig2)))
y <- y + e
b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3))
plot(b,pages=1)
summary(b)
# now check the p-values by using a pure regression spline.....
b.d<-round(b$edf)+1 
b.d<-pmax(b.d,3) # can't have basis dimension less than this!
bc<-gam(y~s(x0,k=b.d[1],fx=TRUE)+s(x1,k=b.d[2],fx=TRUE)+
        s(x2,k=b.d[3],fx=TRUE)+s(x3,k=b.d[4],fx=TRUE))
plot(bc,pages=1)
summary(bc)
# p-value check - increase k to make this useful!
n<-200;p<-0;k<-20
for (i in 1:k)
{ b<-gam(y~s(x,z),data=data.frame(y=rnorm(n),x=runif(n),z=runif(n)))
  p[i]<-summary(b)$s.p[1]
}
plot(((1:k)-0.5)/k,sort(p))

Run the code above in your browser using DataLab

Get 50% off unlimited learning