# gam.models

##### Specifying generalized additive models

This page is intended to provide some more information on how to specify GAMs. A GAM is a GLM in which the linear predictor depends, in part, on a sum of smooth functions of predictors and (possibly) linear functionals of smooth functions of (possibly dummy) predictors.

Specifically let \(y_i\) denote an independent random variable with mean \(\mu_i\) and an exponential family distribution, or failing that a known mean variance relationship suitable for use of quasi-likelihood methods. Then the the linear predictor of a GAM has a structure something like

$$g(\mu_i) = {\bf X}_i{\beta} + f_1(x_{1i},x_{2i}) + f_2(x_{3i}) + L_i f_3(x_4) + \ldots$$

where \(g\) is a known smooth monotonic `link' function, \({\bf X}_i\beta\) is the parametric part of the linear predictor, the \(x_j\) are predictor variables, the \(f_j\) are smooth functions and \(L_i\) is some linear functional of \(f_3\). There may of course be multiple linear functional terms, or none.

The key idea here is that the dependence of the response on the predictors can be represented as a parametric sub-model plus the sum of some (functionals of) smooth functions of one or more of the predictor variables. Thus the model is quite flexible relative to strictly parametric linear or generalized linear models, but still has much more structure than the completely general model that says that the response is just some smooth function of all the covariates.

Note one important point. In order for the model to be identifiable
the smooth functions usually have to be constrained to have zero mean (usually
taken over the set of covariate values). The constraint is needed if the term involving the
smooth includes a constant function in its span. `gam`

always applies such constraints
unless there is a `by`

variable present, in which case an assessment is made of whether
the constraint is needed or not (see below).

The following sections discuss specifying model structures for `gam`

.
Specification of the distribution and link function is done using the `family`

argument to `gam`

and works in the same way as for `glm`

.
This page therefore concentrates on the model formula for `gam`

.

- Keywords
- models, regression

##### Models with simple smooth terms

Consider the example model. $$g(\mu_i) = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + f_1(x_{3i}) + f_2(x_{4i},x_{5i})$$ where the response variables \(y_i\) has expectation \(\mu_i\) and \(g\) is a link function.

The `gam`

formula for this would be
`y ~ x1 + x2 + s(x3) + s(x4,x5)`

.
This would use the default basis for the smooths (a thin plate
regression spline basis for each), with automatic selection of the
effective degrees of freedom for both smooths. The dimension of the
smoothing basis is given a default value as well (the dimension of the
basis sets an upper limit on the maximum possible degrees of
freedom for the basis - the limit is typically one less than basis
dimension). Full details of how to control smooths are given in
`s`

and `te`

, and further discussion of basis
dimension choice can be found in `choose.k`

.
For the moment suppose that we would like to change
the basis of the first smooth to a cubic regression spline basis with
a dimension of 20, while fixing the second term at 25 degrees of
freedom. The appropriate formula would be:
`y ~ x1 + x2 + s(x3,bs="cr",k=20) + s(x4,x5,k=26,fx=TRUE)`

.

The above assumes that \(x_{4}\) and \(x_5\) are naturally on
similar scales (e.g. they might be co-ordinates), so that isotropic smoothing
is appropriate. If this assumption is false then tensor product smoothing might be
better (see `te`

).
`y ~ x1 + x2 + s(x3) + te(x4,x5)`

would generate a tensor product smooth of \(x_{4}\) and \(x_5\).
By default this smooth would have basis dimension 25 and use cubic regression spline marginals.
Varying the defaults is easy. For example
`y ~ x1 + x2 + s(x3) + te(x4,x5,bs=c("cr","ps"),k=c(6,7))`

specifies that the tensor product should use a rank 6 cubic regression spline marginal
and a rank 7 P-spline marginal to create a smooth with basis dimension 42.

##### Nested terms/functional ANOVA

Sometimes it is interesting to specify smooth models with a main effects + interaction structure such as
$$E(y_i) = f_1(x_i) + f_2(z_i) + f_3(x_i,z_i)$$
or
$$E(y_i)=f_1(x_i) + f_2(z_i) + f_3(v_i)
+ f_4(x_i,z_i) + f_5(z_i,v_i) + f_6(z_i,v_i) + f_7(x_i,z_i,v_i) $$
for example. Such models should be set up using `ti`

terms in the model formula. For example:
`y ~ ti(x) + ti(z) + ti(x,z)`

, or
`y ~ ti(x) + ti(z) + ti(v) + ti(x,z) + ti(x,v) + ti(z,v)+ti(x,z,v)`

.
The `ti`

terms produce interactions with the component main effects excluded appropriately. (There is in fact no need to use `ti`

terms for the main effects here, `s`

terms could also be used.)

`gam`

allows nesting (or `overlap') of `te`

and `s`

smooths, and automatically generates side conditions to
make such models identifiable, but the resulting models are much less stable and interpretable than those constructed using `ti`

terms.

##### `by' variables

`by`

variables are the means for constructing `varying-coefficient models' (geographic regression models) and
for letting smooths `interact' with factors or parametric terms. They are also the key to specifying general linear
functionals of smooths.

The `s`

and `te`

terms used to specify smooths accept an argument `by`

,
which is a numeric or factor variable of the same dimension as the covariates of the smooth.
If a `by`

variable is numeric, then its \(i^{th}\) element multiples the \(i^{th}\)
row of the model matrix corresponding to the smooth term concerned.

Factor smooth interactions (see also `factor.smooth.interaction`

).
If a `by`

variable is a `factor`

then it
generates an indicator vector for each level
of the factor, unless it is an `ordered`

factor.
In the non-ordered case, the model matrix for the smooth term is then replicated for each factor level,
and each copy has its rows multiplied by the corresponding rows of its
indicator variable. The smoothness penalties are also duplicated for each
factor level. In short a different smooth is generated
for each factor level (the `id`

argument to `s`

and `te`

can be used to force all
such smooths to have the same smoothing parameter). `ordered`

`by`

variables are handled in the same
way, except that no smooth is generated for the first level of the ordered factor (see `b3`

example below).
This is useful for setting up
identifiable models when the same smooth occurs more than once in a model, with different factor `by`

variables.

As an example, consider the model
$$E(y_i) = \beta_0+ f(x_i)z_i$$
where \(f\) is a smooth function, and \(z_i\) is a numeric variable.
The appropriate formula is:
`y ~ s(x,by=z)`

- the `by`

argument ensures that the smooth function gets multiplied by
covariate `z`

. Note that when using factor by variables, centering constraints are applied to the smooths,
which usually means that the by variable should be included as a parametric term, as well.

The example code below also illustrates the use of factor `by`

variables.

`by`

variables may be supplied as numeric matrices as part of specifying general linear functional terms.

If a `by`

variable is present and numeric (rather than a factor) then the corresponding smooth is only subjected
to an identifiability constraint if (i) the `by`

variable is a constant vector, or, (ii) for a matrix
`by`

variable, `L`

, if `L%*%rep(1,ncol(L))`

is constant or (iii) if a user defined smooth constructor
supplies an identifiability constraint explicitly, and that constraint has an attibute `"always.apply"`

.

##### Linking smooths with `id'

It is sometimes desirable to insist that different smooth terms have the same degree of smoothness.
This can be done by using the `id`

argument to `s`

or `te`

terms. Smooths
which share an `id`

will have the same smoothing parameter. Really this only makes sense if the
smooths use the same basis functions, and the default behaviour is to force this to happen: all smooths
sharing an `id`

have the same basis functions as the first smooth occurring with that `id`

. Note
that if you want exactly the same function for each smooth, then this is best achieved by making use of the
summation convention covered under `linear functional terms'.

As an example suppose that \(E(y_i)\equiv\mu_i\) and
$$g(\mu_i) = f_1(x_{1i}) + f_2(x_{2i},x_{3i}) +
f_3(x_{4i})$$
but that \(f_1\) and \(f_3\) should have the same smoothing parameters (and \(x_2\)
and \(x_3\) are on different scales). Then
the `gam`

formula
`y ~ s(x1,id=1) + te(x_2,x3) + s(x4,id=1)`

would achieve the desired result. `id`

can be numbers or character strings. Giving an `id`

to a
term with a factor `by`

variable causes the smooths at each level of the factor to have the same smoothing
parameter.

Smooth term `id`

s are not supported by `gamm`

.

##### Linear functional terms

General linear functional terms have a long history in the spline literature including in the penalized GLM context (see e.g. Wahba 1990). Such terms encompass varying coefficient models/ geographic regression, functional GLMs (i.e. GLMs with functional predictors), GLASS models, etc, and allow smoothing with respect to aggregated covariate values, for example.

Such terms are implemented in `mgcv`

using a simple `summation convention' for smooth terms: If the covariates of a
smooth are supplied as matrices, then summation of the evaluated smooth over the columns of the matrices is implied. Each
covariate matrix and any `by`

variable matrix must be of the same dimension. Consider, for example the term
`s(X,Z,by=L)`

where `X`

, `Z`

and `L`

are \(n \times p\) matrices. Let \(f\) denote the thin plate regression
spline specified. The resulting contibution to the \(i^{\rm th}\)
element of the linear predictor is
$$\sum_{j=1}^p L_{ij}f(X_{ij},Z_{ij})$$
If no `L`

is supplied then all its elements are taken as 1. In R code terms, let `F`

denote the \(n \times p\)
matrix obtained by evaluating the smooth at the values in `X`

and `Z`

. Then the contribution of the term to the
linear predictor is `rowSums(L*F)`

(note that it's element by element multiplication here!).

The summation convention applies to `te`

terms as well as `s`

terms. More details and examples
are provided in
`linear.functional.terms`

.

##### Random effects

Random effects can be added to `gam`

models using `s(...,bs="re")`

terms (see
`smooth.construct.re.smooth.spec`

),
or the `paraPen`

argument to `gam`

covered below. See `gam.vcomp`

, `random.effects`

and
`smooth.construct.re.smooth.spec`

for further details. An alternative is to use the approach of `gamm`

.

##### Penalizing the parametric terms

In case the ability to add smooth classes, smooth identities, `by`

variables and the summation convention are
still not sufficient to implement exactly the penalized GLM that you require, `gam`

also allows you to penalize the
parametric terms in the model formula. This is mostly useful in
allowing one or more matrix terms to be included in the formula, along with a
sequence of quadratic penalty matrices for each.

Suppose that you have set up a model matrix \(\bf X\), and want to penalize the corresponding coefficients, \(\beta\)
with two penalties \(\beta^T {\bf S}_1 \beta\) and \(\beta^T {\bf S}_2 \beta\).
Then something like the
following would be appropriate:
`gam(y ~ X - 1,paraPen=list(X=list(S1,S2)))`

The `paraPen`

argument should be a list with elements having names corresponding to the terms being penalized.
Each element of `paraPen`

is itself a list, with optional elements `L`

, `rank`

and `sp`

: all other elements
must be penalty matrices. If present, `rank`

is a vector giving the rank of each penalty matrix
(if absent this is determined numerically). `L`

is a matrix that maps underlying log smoothing parameters to the
log smoothing parameters that actually multiply the individual quadratic penalties: taken as the identity if not supplied.
`sp`

is a vector of (underlying) smoothing parameter values: positive values are taken as fixed, negative to signal that
the smoothing parameter should be estimated. Taken as all negative if not supplied.

An obvious application of `paraPen`

is to incorporate random effects, and an example of this is provided below. In this
case the supplied penalty matrices will be (generalized) inverse covariance matrices for the random effects --- i.e. precision
matrices. The final estimate of the covariance matrix corresponding to one of these penalties is given by the (generalized)
inverse of the penalty matrix multiplied by the estimated scale parameter and divided by the estimated
smoothing parameter for the penalty. For example, if you use an identity matrix to penalize some coefficients that are to be viewed as i.i.d.
Gaussian random effects, then their estimated variance will be the estimated scale parameter divided by the estimate of the
smoothing parameter, for this penalty. See the `rail' example below.

P-values for penalized parametric terms should be treated with caution. If you must have them, then use the option `freq=TRUE`

in
`anova.gam`

and `summary.gam`

, which will tend to give reasonable results for random effects implemented this way,
but not for terms with a rank defficient penalty (or penalties with a wide eigen-spectrum).

##### References

Wahba (1990) Spline Models of Observational Data SIAM.

Wood S.N. (2017) Generalized Additive Models: An Introduction with R (2nd edition). Chapman and Hall/CRC Press.

##### Examples

```
# NOT RUN {
require(mgcv)
set.seed(10)
## simulate date from y = f(x2)*x1 + error
dat <- gamSim(3,n=400)
b<-gam(y ~ s(x2,by=x1),data=dat)
plot(b,pages=1)
summary(b)
## Factor `by' variable example (with a spurious covariate x0)
## simulate data...
dat <- gamSim(4)
## fit model...
b <- gam(y ~ fac+s(x2,by=fac)+s(x0),data=dat)
plot(b,pages=1)
summary(b)
## note that the preceding fit is the same as....
b1<-gam(y ~ s(x2,by=as.numeric(fac==1))+s(x2,by=as.numeric(fac==2))+
s(x2,by=as.numeric(fac==3))+s(x0)-1,data=dat)
## ... the `-1' is because the intercept is confounded with the
## *uncentred* smooths here.
plot(b1,pages=1)
summary(b1)
## repeat forcing all s(x2) terms to have the same smoothing param
## (not a very good idea for these data!)
b2 <- gam(y ~ fac+s(x2,by=fac,id=1)+s(x0),data=dat)
plot(b2,pages=1)
summary(b2)
## now repeat with a single reference level smooth, and
## two `difference' smooths...
dat$fac <- ordered(dat$fac)
b3 <- gam(y ~ fac+s(x2)+s(x2,by=fac)+s(x0),data=dat,method="REML")
plot(b3,pages=1)
summary(b3)
rm(dat)
## An example of a simple random effects term implemented via
## penalization of the parametric part of the model...
dat <- gamSim(1,n=400,scale=2) ## simulate 4 term additive truth
## Now add some random effects to the simulation. Response is
## grouped into one of 20 groups by `fac' and each groups has a
## random effect added....
fac <- as.factor(sample(1:20,400,replace=TRUE))
dat$X <- model.matrix(~fac-1)
b <- rnorm(20)*.5
dat$y <- dat$y + dat$X%*%b
## now fit appropriate random effect model...
PP <- list(X=list(rank=20,diag(20)))
rm <- gam(y~ X+s(x0)+s(x1)+s(x2)+s(x3),data=dat,paraPen=PP)
plot(rm,pages=1)
## Get estimated random effects standard deviation...
sig.b <- sqrt(rm$sig2/rm$sp[1]);sig.b
## a much simpler approach uses "re" terms...
rm1 <- gam(y ~ s(fac,bs="re")+s(x0)+s(x1)+s(x2)+s(x3),data=dat,method="ML")
gam.vcomp(rm1)
## Simple comparison with lme, using Rail data.
## See ?random.effects for a simpler method
require(nlme)
b0 <- lme(travel~1,data=Rail,~1|Rail,method="ML")
Z <- model.matrix(~Rail-1,data=Rail,
contrasts.arg=list(Rail="contr.treatment"))
b <- gam(travel~Z,data=Rail,paraPen=list(Z=list(diag(6))),method="ML")
b0
(b$reml.scale/b$sp)^.5 ## `gam' ML estimate of Rail sd
b$reml.scale^.5 ## `gam' ML estimate of residual sd
b0 <- lme(travel~1,data=Rail,~1|Rail,method="REML")
Z <- model.matrix(~Rail-1,data=Rail,
contrasts.arg=list(Rail="contr.treatment"))
b <- gam(travel~Z,data=Rail,paraPen=list(Z=list(diag(6))),method="REML")
b0
(b$reml.scale/b$sp)^.5 ## `gam' REML estimate of Rail sd
b$reml.scale^.5 ## `gam' REML estimate of residual sd
################################################################
## Approximate large dataset logistic regression for rare events
## based on subsampling the zeroes, and adding an offset to
## approximately allow for this.
## Doing the same thing, but upweighting the sampled zeroes
## leads to problems with smoothness selection, and CIs.
################################################################
n <- 50000 ## simulate n data
dat <- gamSim(1,n=n,dist="binary",scale=.33)
p <- binomial()$linkinv(dat$f-6) ## make 1's rare
dat$y <- rbinom(p,1,p) ## re-simulate rare response
## Now sample all the 1's but only proportion S of the 0's
S <- 0.02 ## sampling fraction of zeroes
dat <- dat[dat$y==1 | runif(n) < S,] ## sampling
## Create offset based on total sampling fraction
dat$s <- rep(log(nrow(dat)/n),nrow(dat))
lr.fit <- gam(y~s(x0,bs="cr")+s(x1,bs="cr")+s(x2,bs="cr")+s(x3,bs="cr")+
offset(s),family=binomial,data=dat,method="REML")
## plot model components with truth overlaid in red
op <- par(mfrow=c(2,2))
fn <- c("f0","f1","f2","f3");xn <- c("x0","x1","x2","x3")
for (k in 1:4) {
plot(lr.fit,select=k,scale=0)
ff <- dat[[fn[k]]];xx <- dat[[xn[k]]]
ind <- sort.int(xx,index.return=TRUE)$ix
lines(xx[ind],(ff-mean(ff))[ind]*.33,col=2)
}
par(op)
rm(dat)
## A Gamma example, by modify `gamSim' output...
dat <- gamSim(1,n=400,dist="normal",scale=1)
dat$f <- dat$f/4 ## true linear predictor
Ey <- exp(dat$f);scale <- .5 ## mean and GLM scale parameter
## Note that `shape' and `scale' in `rgamma' are almost
## opposite terminology to that used with GLM/GAM...
dat$y <- rgamma(Ey*0,shape=1/scale,scale=Ey*scale)
bg <- gam(y~ s(x0)+ s(x1)+s(x2)+s(x3),family=Gamma(link=log),
data=dat,method="REML")
plot(bg,pages=1,scheme=1)
# }
```

*Documentation reproduced from package mgcv, version 1.8-31, License: GPL (>= 2)*