summary.gam: Summary for a GAM fit

Description

Takes a fitted gam object produced by gam() and produces various useful summaries from it. (See sink to divert output to a file.)

Usage

## S3 method for class 'gam':
summary(object, dispersion=NULL, freq=FALSE, p.type = 0, ...)
## S3 method for class 'summary.gam':
print(x,digits = max(3, getOption("digits") - 3), 
                  signif.stars = getOption("show.signif.stars"),...)

Arguments

object

a fitted gam object as produced by gam().

a summary.gam object produced by summary.gam().

dispersion

A known dispersion parameter. NULL to use estimate or default (e.g. 1 for Poisson).

freq

By default p-values for parametric terms are calculated using the Bayesian estimated covariance matrix of the parameter estimators. If this is set to TRUE then the frequentist covariance matrix of the parameters is used instead.

p.type

determines how p-values are computed for smooth terms. 0 uses a test statistic with distribution determined by the un-rounded edf of the term. 1 uses upwardly biased rounding of the edf and -1 uses a version of the test statistic with a null distribution

digits

controls number of digits printed in output.

signif.stars

Should significance stars be printed alongside output.

...

other arguments.

Value

summary.gam produces a list of summary information for a fitted gam object.
p.coeffis an array of estimates of the strictly parametric model coefficients.
p.tis an array of the p.coeff's divided by their standard errors.
p.pvis an array of p-values for the null hypothesis that the corresponding parameter is zero. Calculated with reference to the t distribution with the estimated residual degrees of freedom for the model fit if the dispersion parameter has been estimated, and the standard normal if not.
mThe number of smooth terms in the model.
chi.sqAn array of test statistics for assessing the significance of model smooth terms. See details.
s.pvAn array of approximate p-values for the null hypotheses that each smooth term is zero. Be warned, these are only approximate.
searray of standard error estimates for all parameter estimates.
r.sqThe adjusted r-squared for the model. Defined as the proportion of variance explained, where original variance and residual variance are both estimated using unbiased estimators. This quantity can be negative if your model is worse than a one parameter constant model, and can be higher for the smaller of two nested models! The proportion null deviance explained is probably more appropriate for non-normal errors. Note that r.sq does not include any offset in the one parameter model.
dev.explThe proportion of the null deviance explained by the model. The null deviance is computed taking acount of any offset, so dev.expl can be substantially lower than r.sq when an offset is present.
edfarray of estimated degrees of freedom for the model terms.
residual.dfestimated residual degrees of freedom.
nnumber of data.
methodThe smoothing selection criterion used.
sp.criterionThe minimized value of the smoothness selection criterion. Note that for ML and REML methods, what is reported is the negative log maginal likelihood or negative log restricted likelihood.
scaleestimated (or given) scale parameter.
familythe family used.
formulathe original GAM formula.
dispersionthe scale parameter.
pTerms.dfthe degrees of freedom associated with each parameteric term (excluding the constant).
pTerms.chi.sqa Wald statistic for testing the null hypothesis that the each parametric term is zero.
pTerms.pvp-values associated with the tests that each term is zero. For penalized fits these are approximate. The reference distribution is an appropriate chi-squared when the scale parameter is known, and is based on an F when it is not.
cov.unscaledThe estimated covariance matrix of the parameters (or estimators if freq=TRUE), divided by scale parameter.
cov.scaledThe estimated covariance matrix of the parameters (estimators if freq=TRUE).
p.tablesignificance table for parameters
s.tablesignificance table for smooths
p.Termssignificance table for parametric model terms

WARNING

The p-values are approximate (especially for terms that can be penalized to zero): do read the details section.

P-values for terms penalized via `paraPen' will not be correct unless `freq=TRUE' (and maybe not even then).

Details

Model degrees of freedom are taken as the trace of the influence (or hat) matrix ${\bf A}$ for the model fit. Residual degrees of freedom are taken as number of data minus model degrees of freedom. Let ${\bf P}_i$ be the matrix giving the parameters of the ith smooth when applied to the data (or pseudodata in the generalized case) and let ${\bf X}$ be the design matrix of the model. Then $tr({\bf XP}_i )$ is the edf for the ith term. Clearly this definition causes the edf's to add up properly! An alternative version of EDF is more appropriate for p-value computation, and is based on the trace of $2{\bf A} - {\bf AA}$.

print.summary.gam tries to print various bits of summary information useful for term selection in a pretty way.

Unless p.type=5, p-values for smooth terms are usually based on a test statistic motivated by an extension of Nychka's (1988) analysis of the frequentist properties of Bayesian confidence intervals for smooths. These have better frequentist performance (in terms of power and distribution under the null) than the alternative strictly frequentist approximation. Let $\bf f$ denote the vector of values of a smooth term evaluated at the original covariate values and let ${\bf V}_f$ denote the corresponding Bayesian covariance matrix. Let ${\bf V}_f^{r-}$ denote the rank $r$ pseudoinverse of ${\bf V}_f$, where $r$ is the EDF for the term. The statistic used is then $$T = {\bf f}^T {\bf V}_f^{r-}{\bf f}$$ (this can be calculated efficiently without forming the pseudoinverse explicitly). $T$ is compared to an approximation to an appropriate mizture of chi-squared distributions with degrees of freedom given by the EDF for the term, or $T$ is used as a component in an F ratio statistic if the scale parameter has been estimated.

The non-integer rank truncated inverse is constructed to give an approximation varying smoothly between the bounding integer rank approximations, while yielding test statistics with the correct mean and variance under the null. Alternatively (p.type==1) $r$ is obtained by biased rounding of the EDF: values less than .05 above the preceding integer are rounded down, while other values are rounded up. Another option (p.type==-1) uses a statistic of formal rank given by the number of coefficients for the smooth, but with its terms weighted by the eigenvalues of the covariance matrix, so that penalized terms are down-weighted, but the null distribution requires simulation. Other options for p.type are 2 (naive rounding), 3 (round up), 4 (numerical rank determination): these are poor options for theoretically known reasons, and will generate a warning.

The resulting p-value also has a Bayesian interpretation: the probability of observing an $\bf f$ less probable than $\bf 0$, under the approximation for the posterior for $\bf f$ implied by the truncation used in the test statistic.

Note that for terms with no unpenalized terms the Nychka (1988) requirement for smoothing bias to be substantially less than variance breaks down (see e.g. appendix of Marra and Wood, 2012), and this results in incorrect null distribution for p-values computed using the above approach. In this case it is necessary to fall back on slightly cruder frequentist approximations (which may overstate significance a little). The frequentist covariance matrix is used in place of the Bayesian version, and the statistic rank is set to 1 for EDF < 1. In the case of random effects, a further modification is required, since the eigen spectrum of the penalty is then flat and a good unpenalized approximation with rank given by the EDF of the term is not generally available, further breaking the theory used for other smoothers. In this case the rank of the test statistic is set to the full rank of the term, and the p-value relates to testing whether the individual random effects were in fact all zero (despite the estimated posterior modes being those observed).

In simulations the p-values have best behaviour under ML smoothness selection, with REML coming second.

If p.type=5 then the frequentist approximation for p-values of smooth terms described in section 4.8.5 of Wood (2006) is used. The approximation is not great. If ${\bf p}_i$ is the parameter vector for the ith smooth term, and this term has estimated covariance matrix ${\bf V}_i$ then the statistic is ${\bf p}_i^\prime {\bf V}_i^{k-} {\bf p}_i$, where ${\bf V}^{k-}_i$ is the rank k pseudo-inverse of ${\bf V_i}$, and k is estimated rank of ${\bf V_i}$. p-values are obtained as follows. In the case of known dispersion parameter, they are obtained by comparing the chi.sq statistic to the chi-squared distribution with k degrees of freedom, where k is the estimated rank of ${\bf V_i}$. If the dispersion parameter is unknown (in which case it will have been estimated) the statistic is compared to an F distribution with k upper d.f. and lower d.f. given by the residual degrees of freedom for the model. Typically the p-values will be somewhat too low.

By default the p-values for parametric model terms are also based on Wald tests using the Bayesian covariance matrix for the coefficients. This is appropriate when there are "re" terms present, and is otherwise rather similar to the results using the frequentist covariance matrix (freq=TRUE), since the parametric terms themselves are usually unpenalized. Default P-values for parameteric terms that are penalized using the paraPen argument will not be good. However if such terms represent conventional random effects with full rank penalties, then setting freq=TRUE is appropriate.

References

Marra, G and S.N. Wood (2012) Coverage Properties of Confidence Intervals for Generalized Additive Model Components. Scandinavian Journal of Statistics, 39(1), 53-74.

Nychka (1988) Bayesian Confidence Intervals for Smoothing Splines. Journal of the American Statistical Association 83:1134-1143.

Wood S.N. (2006) Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC Press.

Examples

Run this code

library(mgcv)
set.seed(0)
dat <- gamSim(1,n=200,scale=2) ## simulate data

b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
plot(b,pages=1)
summary(b)

## now check the p-values by using a pure regression spline.....
b.d <- round(summary(b)$edf)+1 ## get edf per smooth
b.d <- pmax(b.d,3) # can't have basis dimension less than 3!
bc<-gam(y~s(x0,k=b.d[1],fx=TRUE)+s(x1,k=b.d[2],fx=TRUE)+
        s(x2,k=b.d[3],fx=TRUE)+s(x3,k=b.d[4],fx=TRUE),data=dat)
plot(bc,pages=1)
summary(bc)

## p-value check - increase k to make this useful!
k<-20;n <- 200;p <- rep(NA,k)
for (i in 1:k)
{ b<-gam(y~te(x,z),data=data.frame(y=rnorm(n),x=runif(n),z=runif(n)),
         method="ML")
  p[i]<-summary(b)$s.p[1]
}
plot(((1:k)-0.5)/k,sort(p))
abline(0,1,col=2)
ks.test(p,"punif") ## how close to uniform are the p-values?

Run the code above in your browser using DataLab