Last chance! 50% off unlimited learning
Sale ends in
gam
can also fit any GLM subject to multiple quadratic penalties (including
estimation of degree of penalization). Isotropic or scale invariant smooths of any number of variables are
available as model terms, as are linear functionals of such smooths; confidence/credible intervals are readily
available for any quantity predicted using a fitted model; gam
is extendable: users can add smooths. Smooth terms are represented using penalized regression splines (or similar smoothers)
with smoothing parameters selected by GCV/UBRE/AIC/REML or by regression splines with
fixed degrees of freedom (mixtures of the two are permitted). Multi-dimensional smooths are
available using penalized thin plate regression splines (isotropic) or tensor product splines
(when an isotropic smooth is inappropriate). For an overview of the smooths available see smooth.terms
.
For more on specifying models see gam.models
and linear.functional.terms
.
For more on model selection see gam.selection
.
gam()
is not a clone of what S-PLUS provides: the major
differences are (i) that by default estimation of the
degree of smoothness of model terms is part of model fitting, (ii) a
Bayesian approach to variance estimation is employed that makes for easier
confidence interval calculation (with good coverage probabilites), (iii) that the model
can depend on any (bounded) linear functional of smooth terms,
and, (iv) the parametric part of the model can be penalized, and
(v) the facilities for incorporating smooths of more than one variable are
different: specifically there are no lo
smooths, but instead (a) s
terms can have more than one argument, implying an isotropic smooth and (b) te
smooths are
provided as an effective means for modelling smooth interactions of any
number of variables via scale invariant tensor product smooths. If you want
a clone of what S-PLUS provides use gam from package gam
.
gam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,
na.action,offset=NULL,method="GCV.Cp",
optimizer=c("outer","newton"),control=gam.control(),scale=0,
select=FALSE,knots=NULL,sp=NULL,min.sp=NULL,H=NULL,gamma=1,
fit=TRUE,paraPen=NULL,G=NULL,in.out,...)
formula.gam
and also gam.models
).
This is exactly like the formula for a GLM except that smooth terms, s
and te
environment(formula)
: typically the environment from
which gam
is called.formula
: this conforms to the behaviour of
lm
and glm
.gam.control
."GCV.Cp"
to use GCV for unknown scale parameter and
Mallows' Cp/UBRE/AIC for known scale. "GACV.Cp"
is equivalent, but using GACV in place of GCV. "REML"
for REML estimatiomethod
). "perf"
for performance iteration. "outer"
for the more stable direct approachTRUE
then gam
can add an extra penalty to each term so
that it can be penalized to zero. This means that the smoothing parameter estimation that is
part of fitting can completely remove terms from the model. If the k
value
supplied (note that the number of knots is nfull.sp
, in the
returned object, will need to be added to what is supplied here to get the
smoothing parameters actuaTRUE
then gam
sets up the model and fits it, but if it is
FALSE
then the model is set up and an object G
containing what
would be required to fit is returned is returned. See argumentgam.models
explains more.NULL
, but may contain the object returned by a previous call to gam
with
fit=FALSE
, in which case all other arguments are ignored except for
gamma
, in.out
, control
, sp
should be an array of
initialization values for all smoothing parameters (there must be a value for
all smoothing parameters, whether fixed orgam.fit
(such as mustart
).fit=FALSE
the function returns a list G
of items needed to
fit a GAM, but doesn't actually fit it. Otherwise the function returns an object of class "gam"
as described in gamObject
.
Automatic smoothing parameter selection is not likely to work well when fitting models to very few response data.
For data with many zeroes clustered together in the covariate space it is quite easy to set up GAMs which suffer from identifiability problems, particularly when using Poisson or binomial families. The problem is that with e.g. log or logit links, mean value zero corresponds to an infinite range on the linear predictor scale.
If absolutely any smooth functions were allowed in model fitting then maximum likelihood estimation of such models would invariably result in complex overfitting estimates of $f_1$ and $f_2$. For this reason the models are usually fit by penalized likelihood maximization, in which the model (negative log) likelihood is modified by the addition of a penalty for each smooth function, penalizing its `wiggliness'. To control the tradeoff between penalizing wiggliness and penalizing badness of fit each penalty is multiplied by an associated smoothing parameter: how to estimate these parameters, and how to practically represent the smooth functions are the main statistical questions introduced by moving from GLMs to GAMs.
The mgcv
implementation of gam
represents the smooth functions using
penalized regression splines, and by default uses basis functions for these splines that
are designed to be optimal, given the number basis functions used. The smooth terms can be
functions of any number of covariates and the user has some control over how smoothness of
the functions is measured.
gam
in mgcv
solves the smoothing parameter estimation problem by using the
Generalized Cross Validation (GCV) criterion
Alternatives are GACV, or a Laplace approximation to REML, there is some evidence that the latter may actually be the most effective choise.
Smoothing parameters are chosen to
minimize the GCV, UBRE/AIC, GACV or REML scores for the model, and the main computational challenge solved
by the mgcv
package is to do this efficiently and reliably. Various
alternative numerical methods are provided which can be set by argument optimizer
.
Broadly gam
works by first constructing basis functions and one or more quadratic penalty
coefficient matrices for each smooth term in the model formula, obtaining a model matrix for
the strictly parametric part of the model formula, and combining these to obtain a
complete model matrix (/design matrix) and a set of penalty matrices for the smooth terms.
Some linear identifiability constraints are also obtained at this point. The model is
fit using gam.fit
, a modification of glm.fit
. The GAM
penalized likelihood maximization problem is solved by Penalized Iteratively
Reweighted Least Squares (P-IRLS) (see e.g. Wood 2000).
Smoothing parameter selection is integrated in one of two ways. (i)
`Performance iteration' uses the fact that at each P-IRLS iteration a penalized
weighted least squares problem is solved, and the smoothing parameters of that problem can
estimated by GCV or UBRE. Eventually, in most cases, both model parameter estimates and smoothing
parameter estimates converge. (ii) Alternatively the P-IRLS scheme is iterated to
convergence for each trial set of smoothing parameters, and GCV, UBRE or REML scores
are only evaluated on convergence - optimization is then `outer' to the P-IRLS
loop: in this case the P-IRLS iteration has to be differentiated, to
facilitate optimization, and gam.fit3
is used in place of
gam.fit
. The default is the second method, outer iteration.
Several alternative basis-penalty types are built in for representing model
smooths, but alternatives can easily be added (see smooth.terms
for an overview and smooth.construct
for how to add smooth classes). In practice the
default basis is usually the best choise, but the choise of the basis dimension (k
in the
s
and te
terms) is something that should be considered carefully (the exact value is not critical,
but it is important not to make it restrictively small, nor very large and computationally costly). The basis should
be chosen to be larger than is believed to be necessary to approximate the smooth function concerned.
The effective degrees of freedom for the smooth will then be controlled by the smoothing penalty on
the term, and (usually) selected automatically (withy an upper limit set by
k-1
or occasionally k
). Of course
the k
should not be made too large, or computation will be slow (or in
extreme cases there will be more
coefficients to estimate than there are data).
Note that gam
assumes a very inclusive definition of what counts as a GAM:
basically any penalized GLM can be used: to this end gam
allows the non smooth model
components to be penalized via argument paraPen
and allows the linear predictor to depend on
general linear functionals of smooths, via the summation convention mechanism described in
linear.functional.terms
.
Details of the default underlying fitting methods are given in Wood (2004
and 2008). Some alternative methods are discussed in Wood (2000 and 2006).
Wood, S.N. (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Ass. 99:673-686. [Default method for additive case (but no longer for generalized)]
Wood, S.N. (2008) Fast stable direct fitting and smoothness selection for generalized additive models. J.R.Statist.Soc.B 70(3):495-518. [Generalized additive model case]
Wood, S.N. (2003) Thin plate regression splines. J.R.Statist.Soc.B 65(1):95-114
Wood, S.N. (2006a) Low rank scale invariant tensor product smooths for generalized additive mixed models. Biometrics 62(4):1025-1036
Wood S.N. (2006b) Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC Press.
Wood, S.N. (2006c) On confidence intervals for generalized additive models based on penalized regression splines. Australian and New Zealand Journal of Statistics. 48(4): 445-464.
Key Reference on GAMs and related models:
Hastie (1993) in Chambers and Hastie (1993) Statistical Models in S. Chapman and Hall.
Hastie and Tibshirani (1990) Generalized Additive Models. Chapman and Hall.
Wahba (1990) Spline Models of Observational Data. SIAM
Wood, S.N. (2000) Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties. J.R.Statist.Soc.B 62(2):413-428 [The original mgcv paper, but no longer the default methods.]
Background References:
Green and Silverman (1994) Nonparametric Regression and Generalized Linear Models. Chapman and Hall. Gu and Wahba (1991) Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM J. Sci. Statist. Comput. 12:383-398
Gu (2002) Smoothing Spline ANOVA Models, Springer.
O'Sullivan, Yandall and Raynor (1986) Automatic smoothing of regression functions in generalized linear models. J. Am. Statist.Ass. 81:96-103
Wood (2001) mgcv:GAMs and Generalized Ridge Regression for R. R News 1(2):20-25 Wood and Augustin (2002) GAMs with integrated model selection using penalized regression splines and applications to environmental modelling. Ecological Modelling 157:157-177
mgcv-package
, gamObject
, gam.models
, smooth.terms
,
linear.functional.terms
, s
,
te
predict.gam
,
plot.gam
, summary.gam
, gam.side
,
gam.selection
,mgcv
, gam.control
gam.check
, linear.functional.terms
negbin
, magic
,vis.gam
library(mgcv)
set.seed(0) ## simulate some data...
dat <- gamSim(1,n=400,dist="normal",scale=2)
b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
summary(b)
plot(b,pages=1,residuals=TRUE) ## show partial residuals
plot(b,pages=1,seWithMean=TRUE) ## `with intercept' CIs
## same fit in two parts .....
G<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),fit=FALSE,data=dat)
b<-gam(G=G)
print(b)
## change the smoothness selection method to REML
b0<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat,method="REML")
plot(b,pages=1)
## ... and do automatic terms selection as well
b1<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat,
method="REML",select=TRUE)
plot(b1,pages=1)
## set the smoothing parameter for the first term, estimate rest ...
bp<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),sp=c(0.01,-1,-1,-1),data=dat)
plot(bp,pages=1)
## alternatively...
bp <- gam(y~s(x0,sp=.01)+s(x1)+s(x2)+s(x3),data=dat)
# set lower bounds on smoothing parameters ....
bp<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),
min.sp=c(0.001,0.01,0,10),data=dat)
print(b);print(bp)
# same with REML
bp<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),
min.sp=c(0.1,0.1,0,10),data=dat,method="REML")
print(b0);print(bp)
## now a GAM with 3df regression spline term & 2 penalized terms
b0<-gam(y~s(x0,k=4,fx=TRUE,bs="tp")+s(x1,k=12)+s(x2,k=15),data=dat)
plot(b0,pages=1)
## now fit a 2-d term to x0,x1
b1<-gam(y~s(x0,x1)+s(x2)+s(x3),data=dat)
par(mfrow=c(2,2))
plot(b1)
par(mfrow=c(1,1))
## now simulate poisson data...
dat <- gamSim(1,n=400,dist="poisson",scale=.25)
b2<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,data=dat)
plot(b2,pages=1)
## repeat fit using performance iteration
b3<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,
data=dat,optimizer="perf")
plot(b3,pages=1)
## repeat using GACV as in Wood 2008...
b4<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,
data=dat,method="GACV.Cp",scale=-1)
plot(b4,pages=1)
## repeat using REML as in Wood 2008...
b5<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,
data=dat,method="REML")
plot(b5,pages=1)
## a binary example
dat <- gamSim(1,n=400,dist="binary",scale=.33)
lr.fit <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=binomial,data=dat)
## plot model components with truth overlaid in red
op <- par(mfrow=c(2,2))
fn <- c("f0","f1","f2","f3");xn <- c("x0","x1","x2","x3")
for (k in 1:4) {
plot(lr.fit,residuals=TRUE,select=k)
ff <- dat[[fn[k]]];xx <- dat[[xn[k]]]
ind <- sort.int(xx,index.return=TRUE)$ix
lines(xx[ind],(ff-mean(ff))[ind]*.33,col=2)
}
par(op)
anova(lr.fit)
lr.fit1 <- gam(y~s(x0)+s(x1)+s(x2),family=binomial,data=dat)
lr.fit2 <- gam(y~s(x1)+s(x2),family=binomial,data=dat)
AIC(lr.fit,lr.fit1,lr.fit2)
## now a 2D smoothing example...
eg <- gamSim(2,n=500,scale=.1)
attach(eg)
op <- par(mfrow=c(2,2),mar=c(4,4,1,1))
contour(truth$x,truth$z,truth$f) ## contour truth
b4 <- gam(y~s(x,z),data=data) ## fit model
fit1 <- matrix(predict.gam(b4,pr,se=FALSE),40,40)
contour(truth$x,truth$z,fit1) ## contour fit
persp(truth$x,truth$z,truth$f) ## persp truth
vis.gam(b4) ## persp fit
detach(eg)
par(op)
## very large dataset example with user defined knots
par(mfrow=c(1,1))
eg <- gamSim(2,n=10000,scale=.5)
attach(eg)
ind<-sample(1:10000,1000,replace=FALSE)
b5<-gam(y~s(x,z,k=50),data=data,
knots=list(x=data$x[ind],z=data$z[ind]))
vis.gam(b5)
## and a pure "knot based" spline of the same data
b6<-gam(y~s(x,z,k=100),data=data,knots=list(x= rep((1:10-0.5)/10,10),
z=rep((1:10-0.5)/10,rep(10,10))))
vis.gam(b6,color="heat")
## varying the default large dataset behaviour via `xt'
b7 <- gam(y~s(x,z,k=50,xt=list(max.knots=1000,seed=2)),data=data)
vis.gam(b7)
detach(eg)
Run the code above in your browser using DataLab