gam.models. For more on model 
selection see gam.selection. For large datasets see warnings. gam() is not a clone of what S-PLUS provides: the major
  differences are (i) that by default estimation of the
  degree of smoothness of model terms is part of model fitting, (ii) a
  Bayesian approach to variance estimation is employed that makes for easier
  confidence interval calculation (with good coverage probabilites) and (iii) the
  facilities for incorporating smooths of more than one variable are
  different: specifically there are no lo smooths, but instead (a) s
  terms can have more than one argument, implying an isotropic smooth and (b) te smooths are
  provided as an effective means for modelling smooth interactions of any
  number of variables via scale invariant tensor product smooths. If you want
  a clone of what S-PLUS provides use gam from package gam.
gam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,
    na.action,offset=NULL,control=gam.control(),scale=0,knots=NULL,
    sp=NULL,min.sp=NULL,H=NULL,gamma=1,fit=TRUE,G=NULL,...)gam.models). This is exactly like the formula for a
GLM except that smooth terms can be added to the right hand side of the
formula (and a formula of the form y ~ . ienvironment(formula): typically the environment from 
which gam is called.formula: this conforms to the behaviour of
lm and glm.gam.control.cr and cc bases the user simply supplies the knots to be used, and there must be the same number as the basis
dimension, <"magic" (see "magic" only, lower bounds can be 
supplied for the smoothing parameters. Note that if this option is used then
the smoothing parameters sp, in the returned object, will need to be added to
what is supplied here to"magic" a user supplied fixed quadratic 
penalty on the parameters of the 
GAM can be supplied, with this as its coefficient matrix. A common use of this term is 
to add a ridge penalty to the parameters of the GAM in circumst"magic".NULL, but may contain the object returned by a previous call to gam with 
fit=FALSE, in which case all other arguments are ignored except for gamma, family, control and gam.fitfit == FALSE the function returns a list G of items needed to
fit a GAM, but doesn't actually fit it. Otherwise the function returns an object of class "gam" as described in gamObject.
"mgcv" is selected, the code does not check for rank deficiency of the
model matrix that may result from lack of identifiability between the
parametric and smooth components of the model. You must have more unique combinations of covariates than the model has total parameters. (Total parameters is sum of basis dimensions plus sum of non-spline terms less the number of spline terms).
Automatic smoothing parameter selection is not likely to work well when fitting models to very few response data.
With large datasets (more than a few thousand data) the "tp"
basis gets very slow to use: use the knots argument as discussed above and 
shown in the examples. Alternatively, for 1-d smooths  you can use the "cr" basis and 
for multi-dimensional smooths use te smooths.
For data with many  zeroes clustered together in the covariate space it is quite easy to set up 
GAMs which suffer from identifiability problems, particularly when using Poisson or binomial
families. The problem is that with e.g. log or logit links, mean value zero corresponds to
an infinite range on the linear predictor scale. Some regularization is possible in such cases: see 
gam.control for details.
If absolutely any smooth functions were allowed in model fitting then maximum likelihood estimation of such models would invariably result in complex overfitting estimates of $f_1$ and $f_2$. For this reason the models are usually fit by penalized likelihood maximization, in which the model (negative log) likelihood is modified by the addition of a penalty for each smooth function, penalizing its `wiggliness'. To control the tradeoff between penalizing wiggliness and penalizing badness of fit each penalty is multiplied by an associated smoothing parameter: how to estimate these parameters, and how to practically represent the smooth functions are the main statistical questions introduced by moving from GLMs to GAMs.
The mgcv implementation of gam represents the smooth functions using 
penalized regression splines, and by default uses basis functions for these splines that 
are designed to be optimal, given the number basis functions used. The smooth terms can be 
functions of any number of covariates and the user has some control over how smoothness of 
the functions is measured. 
gam in mgcv solves the smoothing parameter estimation problem by using the 
Generalized Cross Validation (GCV) criterion or an Un-Biased Risk Estimator criterion 
(UBRE) which is in practice an approximation to AIC. Smoothing parameters are chosen to 
minimize the GCV or UBRE score for the model, and the main computational challenge solved 
by the mgcv package is to do this efficiently and reliably. Two alternative 
numerical methods are provided, see mgcv, magic and 
gam.control. 
Broadly gam works by first constructing basis functions and one or more quadratic penalty 
coefficient matrices for each smooth term in the model formula, obtaining a model matrix for 
the strictly parametric part of the model formula, and combining these to obtain a 
complete model matrix (/design matrix) and a set of penalty matrices for the smooth terms. 
Some linear identifiability constraints are also obtained at this point. The model is 
fit using gam.fit, a modification of glm.fit. The GAM 
penalized likelihood maximization problem is solved by penalized Iteratively 
Reweighted  Least Squares (IRLS) (see e.g. Wood 2000). At each iteration a penalized 
weighted least squares problem is solved, and the smoothing parameters of that problem are 
estimated by GCV or UBRE. Eventually both model parameter estimates and smoothing 
parameter estimates converge. 
The fitting approach just described, in which the smoothing parameters are estimated for 
each approximating linear model of the IRLS process was suggested by Chong Gu (see, e.g. 
Gu 2002), and is very computationally efficient. However, because the approach neglects 
the dependence of the iterative weights on the smoothing parameters, it is usually 
possible to find smoothing parameters which actually yield slightly lower GCV/UBRE score 
estimates than those resulting from this `performance iteration'. gam therefore also 
allows the user to `improve' the smoothing parameter estimates, by using O'Sullivan's 
(1986) suggested method, in which for each trial set of smoothing parameters the IRLS is 
iterated to convergence before the UBRE/GCV score is evaluated. This requires much less 
efficient minimisation of the power iteration based on nlm, and is 
therefore quite slow. 
Five alternative basis-penalty types  are built in for representing model
smooths, but alternatives can easily be added (see smooth.construct which
uses p-splines to illustrate how to add new smooths). 
The built in alternatives for univariate smooths terms are: a conventional penalized
cubic regression spline basis, parameterized in terms of the function values at the knots; 
a cyclic cubic spline with a similar parameterization and thin plate regression splines. 
The cubic spline bases are computationally very efficient, but require `knot' locations to be 
chosen (automatically by default). The thin plate regression splines are optimal low rank 
smooths which do not have knots, but are more computationally costly to set
up. Smooths of several variables can be represented using thin plate
regression splines, or tensor products of any available basis 
including user defined bases (tensor product penalties are obtained automatically form 
the marginal basis penalties). The t.p.r.s. basis is isotropic, so if this is not appropriate tensor 
product terms should be used. Tensor product smooths have one penalty and smoothing parameter per marginal 
basis, which means that the relative scaling of covariates is essentially determined automatically by GCV/UBRE. 
The t.p.r.s. basis and cubic regression spline bases are both available with
either conventional `wiggliness penalties' or penalties augmented with a
shrinkage component: the conventional penalties treat some space of functions
as `completely smooth' and do not penalize such functions at all; the
penalties with extra shrinkage will zero a term altogether for high enough
smoothing parameters: gam.selection has an example of the use of
such terms.
For any basis the user specifies the dimension of the basis for each smooth term. The dimension of the basis is one more than the maximum degrees of freedom that the term can have, but usually the term will be fitted by penalized maximum likelihood estimation and the actual degrees of freedom will be chosen by GCV. However, the user can choose to fix the degrees of freedom of a term, in which case the actual degrees of freedom will be one less than the basis dimension.
  Thin plate regression splines are constructed by starting with the
  basis for a full thin plate spline and then truncating this basis in
  an optimal manner, to obtain a low rank smoother. Details are given in
  Wood (2003). One key advantage of the approach is that it avoids
  the knot placement problems of conventional regression spline
  modelling, but it also has the advantage that smooths of lower rank
  are nested within smooths of higher rank, so that it is legitimate to
  use conventional hypothesis testing methods to compare models based on
  pure regression splines. The t.p.r.s. basis can become expensive to
  calculate for large datasets. In this case the user can supply a reduced 
  set of knots to use in basis construction (see knots, in the argument list), or 
use tensor products of cheaper bases.
  
  In the case of the cubic regression spline basis, knots  of the spline are placed evenly
  throughout the covariate values to which the term refers:  For
  example, if fitting 101 data with an 11 knot spline of x then
  there would be a knot at every 10th (ordered)  x value. The
  parameterization used represents the spline in terms of its
   values at the knots. The values at neighbouring knots
     are connected by sections of  cubic polynomial constrained to be 
     continuous up to and including second derivative at the knots. The resulting curve
     is a natural cubic  spline through the values at the knots (given two extra conditions specifying 
     that the second derivative of the curve should be zero at the two end 
     knots). This parameterization gives the parameters a nice interpretability. 
Details of the underlying fitting methods are given in Wood (2000, 2004a).
Wood, S.N. (2000) Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties. J.R.Statist.Soc.B 62(2):413-428
Wood, S.N. (2003) Thin plate regression splines. J.R.Statist.Soc.B 65(1):95-114
Wood, S.N. (2004a) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Ass. 99:637-686
Wood, S.N. (2004b) On confidence intervals for GAMs based on penalized regression splines. Technical Report 04-12 Department of Statistics, University of Glasgow.
Wood, S.N. (2004c) Low rank scale invariant tensor product smooths for generalized additive mixed models. Technical Report 04-13 Department of Statistics, University of Glasgow.
Key Reference on GAMs and related models:
Hastie (1993) in Chambers and Hastie (1993) Statistical Models in S. Chapman and Hall.
Hastie and Tibshirani (1990) Generalized Additive Models. Chapman and Hall.
Wahba (1990) Spline Models of Observational Data. SIAM
Background References:
Green and Silverman (1994) Nonparametric Regression and Generalized Linear Models. Chapman and Hall. Gu and Wahba (1991) Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM J. Sci. Statist. Comput. 12:383-398
Gu (2002) Smoothing Spline ANOVA Models, Springer.
O'Sullivan, Yandall and Raynor (1986) Automatic smoothing of regression functions in generalized linear models. J. Am. Statist.Ass. 81:96-103
Wood (2001) mgcv:GAMs and Generalized Ridge Regression for R. R News 1(2):20-25 Wood and Augustin (2002) GAMs with integrated model selection using penalized regression splines and applications to environmental modelling. Ecological Modelling 157:157-177
gamObject, gam.models, s, predict.gam,
  plot.gam, summary.gam, gam.side.conditions,
gam.selection,mgcv, gam.control
gam.check, gam.neg.bin, magic,vis.gamlibrary(mgcv)
set.seed(0) 
n<-400
sig<-2
x0 <- runif(n, 0, 1)
x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1)
x3 <- runif(n, 0, 1)
f <- 2 * sin(pi * x0)
f <- f + exp(2 * x1) - 3.75887
f <- f+0.2*x2^11*(10*(1-x2))^6+10*(10*x2)^3*(1-x2)^10-1.396
e <- rnorm(n, 0, sig)
y <- f + e
b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3))
summary(b)
plot(b,pages=1,residuals=TRUE)
# same fit in two parts .....
G<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),fit=FALSE)
b<-gam(G=G)
# an extra ridge penalty (useful with convergence problems) ....
bp<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),H=diag(0.5,41)) 
print(b);print(bp);rm(bp)
# set the smoothing parameter for the first term, estimate rest ...
bp<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),sp=c(0.01,-1,-1,-1))
plot(bp,pages=1);rm(bp)
# set lower bounds on smoothing parameters ....
bp<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),min.sp=c(0.001,0.01,0,10)) 
print(b);print(bp);rm(bp)
# now a GAM with 3df regression spline term & 2 penalized terms
b0<-gam(y~s(x0,k=4,fx=TRUE,bs="tp")+s(x1,k=12)+s(x2,k=15))
plot(b0,pages=1)
# now fit a 2-d term to x0,x1
b1<-gam(y~s(x0,x1)+s(x2)+s(x3))
par(mfrow=c(2,2))
plot(b1)
par(mfrow=c(1,1))
# now simulate poisson data
g<-exp(f/5)
y<-rpois(rep(1,n),g)
b2<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson)
plot(b2,pages=1)
# and a pretty 2-d smoothing example....
test1<-function(x,z,sx=0.3,sz=0.4)  
{ (pi**sx*sz)*(1.2*exp(-(x-0.2)^2/sx^2-(z-0.3)^2/sz^2)+
  0.8*exp(-(x-0.7)^2/sx^2-(z-0.8)^2/sz^2))
}
n<-500
old.par<-par(mfrow=c(2,2))
x<-runif(n);z<-runif(n);
xs<-seq(0,1,length=30);zs<-seq(0,1,length=30)
pr<-data.frame(x=rep(xs,30),z=rep(zs,rep(30,30)))
truth<-matrix(test1(pr$x,pr$z),30,30)
contour(xs,zs,truth)
y<-test1(x,z)+rnorm(n)*0.1
b4<-gam(y~s(x,z))
fit1<-matrix(predict.gam(b4,pr,se=FALSE),30,30)
contour(xs,zs,fit1)
persp(xs,zs,truth)
vis.gam(b4)
par(old.par)
# very large dataset example using knots
n<-10000
x<-runif(n);z<-runif(n);
y<-test1(x,z)+rnorm(n)
ind<-sample(1:n,1000,replace=FALSE)
b5<-gam(y~s(x,z,k=50),knots=list(x=x[ind],z=z[ind]))
vis.gam(b5)
# and a pure "knot based" spline of the same data
b6<-gam(y~s(x,z,k=100),knots=list(x= rep((1:10-0.5)/10,10),
        z=rep((1:10-0.5)/10,rep(10,10))))
vis.gam(b6,color="heat")Run the code above in your browser using DataLab