Fit a vector generalized additive model (VGAM). Both 1st-generation VGAMs (based on backfitting) and 2nd-generation VGAMs (based on P-splines, with automatic smoothing parameter selection) are implemented. This is a large class of models that includes generalized additive models (GAMs) and vector generalized linear models (VGLMs) as special cases.

```
vgam(formula, family = stop("argument 'family' needs to be assigned"),
data = list(), weights = NULL, subset = NULL,
na.action = na.fail, etastart = NULL, mustart = NULL,
coefstart = NULL, control = vgam.control(...), offset = NULL,
method = "vgam.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE,
contrasts = NULL, constraints = NULL,
extra = list(), form2 = NULL, qr.arg = FALSE, smart = TRUE, ...)
```

formula

a symbolic description of the model to be fit.
The RHS of the formula is applied to each linear/additive predictor,
and should include at least one
`sm.os`

term
or `sm.ps`

term
or `s`

term.
Mixing both together is not allowed.
Different variables in each linear/additive predictor
can be chosen by specifying constraint matrices.

family

Same as for `vglm`

.

data

an optional data frame containing the variables in the model.
By default the variables are taken from
`environment(formula)`

, typically the environment from which
`vgam`

is called.

weights, subset, na.action

etastart, mustart, coefstart

Same as for `vglm`

.

control

a list of parameters for controlling the fitting process.
See `vgam.control`

for details.

method

the method to be used in fitting the model.
The default (and presently only) method `vgam.fit`

uses iteratively reweighted least squares (IRLS).

constraints, model, offset

Same as for `vglm`

.

x.arg, y.arg

logical values indicating whether the model matrix and response
vector/matrix used in the fitting process should be assigned in the
`x`

and `y`

slots. Note the model matrix is the LM model
matrix; to get the VGAM model matrix type `model.matrix(vgamfit)`

where `vgamfit`

is a `vgam`

object.

contrasts, extra, form2, qr.arg, smart

Same as for `vglm`

.

…

further arguments passed into `vgam.control`

.

For G1-VGAMs and G2-VGAMs, an object of class
`"vgam"`

or
`"pvgam"`

respectively
(see `vgam-class`

and `pvgam-class`

for further information).

For G1-VGAMs,
currently `vgam`

can only handle constraint matrices `cmat`

,
say, such that `crossprod(cmat)`

is diagonal.
It can be detected by `is.buggy`

.
VGAMs with constraint matrices that have non-orthogonal columns should
be fitted with
`sm.os`

or
`sm.ps`

terms
instead of `s`

.

See warnings in `vglm.control`

.

A vector generalized additive model (VGAM) is loosely defined
as a statistical model that is a function of \(M\) additive predictors.
The central formula is given by
$$\eta_j = \sum_{k=1}^p f_{(j)k}(x_k)$$
where \(x_k\) is the \(k\)th explanatory variable
(almost always \(x_1=1\) for the intercept term),
and
\(f_{(j)k}\) are smooth functions of \(x_k\) that are estimated
by smoothers.
The first term in the summation is just the intercept.
Currently
two types of smoothers are
implemented:
`s`

represents
the older and more traditional one, called a
*vector (cubic smoothing spline) smoother* and is
based on Yee and Wild (1996);
it is more similar to the R package gam.
The newer one is represented by
`sm.os`

and
`sm.ps`

, and these are
based on O-splines and P-splines---they allow automatic
smoothing parameter selection; it is more similar
to the R package mgcv.

In the above, \(j=1,\ldots,M\) where \(M\) is finite.
If all the functions are constrained to be linear then
the resulting model is a vector generalized linear model
(VGLM). VGLMs are best fitted with `vglm`

.

Vector (cubic smoothing spline) smoothers are represented
by `s()`

(see `s`

). Local
regression via `lo()`

is *not* supported. The
results of `vgam`

will differ from the `gam()`

(in the gam) because `vgam()`

uses a different
knot selection algorithm. In general, fewer knots are
chosen because the computation becomes expensive when
the number of additive predictors \(M\) is large.

Second-generation VGAMs are based on the
O-splines and P-splines.
The latter is due to Eilers and Marx (1996).
Backfitting is not required, and estimation is performed using IRLS.
The function `sm.os`

represents a *smart*
implementation of O-splines.
The function `sm.ps`

represents a *smart*
implementation of P-splines.
Written G2-VGAMs or P-VGAMs, this methodology should not be used
unless the sample size is reasonably large.
Usually an UBRE predictive criterion is optimized
(at each IRLS iteration)
because the
scale parameter for VGAMs is usually assumed to be known.
This search for optimal smoothing parameters does not always converge,
and neither is it totally reliable.
G2-VGAMs implicitly set `criterion = "coefficients"`

so that
convergence occurs when the change in the regression coefficients
between 2 IRLS iterations is sufficiently small.
Otherwise the search for the optimal smoothing parameters might
cause the log-likelihood to decrease between 2 IRLS iterations.
Currently *outer iteration* is implemented,
by default,
rather than *performance iteration* because the latter
is more easy to converge to a local solution; see
Wood (2004) for details.
One can use *performance iteration*
by setting `Maxit.outer = 1`

in
`vgam.control`

.

The underlying algorithm of VGAMs is IRLS.
First-generation VGAMs (called G1-VGAMs)
are estimated by modified vector backfitting
using vector splines. O-splines are used as the basis functions
for the vector (smoothing) splines, which are a lower dimensional
version of natural B-splines.
The function `vgam.fit()`

actually does the
work. The smoothing code is based on F. O'Sullivan's
BART code.

A closely related methodology based on VGAMs called
*constrained additive ordination* (CAO) first forms
a linear combination of the explanatory variables (called
*latent variables*) and then fits a GAM to these.
This is implemented in the function `cao`

for a very limited choice of family functions.

Wood, S. N. (2004).
Stable and efficient multiple smoothing parameter estimation
for generalized additive models.
*J. Amer. Statist. Assoc.*, **99**(467): 673--686.

Yee, T. W. and Wild, C. J. (1996)
Vector generalized additive models.
*Journal of the Royal Statistical Society, Series B, Methodological*,
**58**, 481--493.

Yee, T. W. (2008)
The `VGAM`

Package.
*R News*, **8**, 28--39.

Yee, T. W. (2015)
Vector Generalized Linear and Additive Models:
With an Implementation in R.
New York, USA: *Springer*.

Yee, T. W. (2016).
Comments on ``Smoothing parameter and model selection for
general smooth models''
by Wood, S. N. and Pya, N. and Safken, N.,
*J. Amer. Statist. Assoc.*, **110**(516).

`is.buggy`

,
`vgam.control`

,
`vgam-class`

,
`vglmff-class`

,
`plotvgam`

,
`summaryvgam`

,
`summarypvgam`

,
`sm.os`

,
`sm.ps`

,
`s`

,
`magic`

.
`vglm`

,
`vsmooth.spline`

,
`cao`

.

# NOT RUN { # Nonparametric proportional odds model pneumo <- transform(pneumo, let = log(exposure.time)) vgam(cbind(normal, mild, severe) ~ s(let), cumulative(parallel = TRUE), data = pneumo, trace = TRUE) # Nonparametric logistic regression hfit <- vgam(agaaus ~ s(altitude, df = 2), binomialff, data = hunua) # } # NOT RUN { plot(hfit, se = TRUE) # } # NOT RUN { phfit <- predict(hfit, type = "terms", raw = TRUE, se = TRUE) names(phfit) head(phfit$fitted) head(phfit$se.fit) phfit$df phfit$sigma # Fit two species simultaneously hfit2 <- vgam(cbind(agaaus, kniexc) ~ s(altitude, df = c(2, 3)), binomialff(multiple.responses = TRUE), data = hunua) coef(hfit2, matrix = TRUE) # Not really interpretable # } # NOT RUN { plot(hfit2, se = TRUE, overlay = TRUE, lcol = 3:4, scol = 3:4) ooo <- with(hunua, order(altitude)) with(hunua, matplot(altitude[ooo], fitted(hfit2)[ooo,], ylim = c(0, 0.8), xlab = "Altitude (m)", ylab = "Probability of presence", las = 1, main = "Two plant species' response curves", type = "l", lwd = 2)) with(hunua, rug(altitude)) # } # NOT RUN { # The 'subset' argument does not work here. Use subset() instead. set.seed(1) zdata <- data.frame(x2 = runif(nn <- 500)) zdata <- transform(zdata, y = rbinom(nn, 1, 0.5)) zdata <- transform(zdata, subS = runif(nn) < 0.7) sub.zdata <- subset(zdata, subS) # Use this instead if (FALSE) fit4a <- vgam(cbind(y, y) ~ s(x2, df = 2), binomialff(multiple.responses = TRUE), data = zdata, subset = subS) # This fails!!! fit4b <- vgam(cbind(y, y) ~ s(x2, df = 2), binomialff(multiple.responses = TRUE), data = sub.zdata) # This succeeds!!! fit4c <- vgam(cbind(y, y) ~ sm.os(x2), binomialff(multiple.responses = TRUE), data = sub.zdata) # This succeeds!!! # } # NOT RUN { par(mfrow = c(2, 2)) plot(fit4b, se = TRUE, shade = TRUE, shcol = "pink") plot(fit4c, se = TRUE, shade = TRUE, shcol = "pink") # }

Run the code above in your browser using DataCamp Workspace