VGAM (version 0.8-1)

vgam: Fitting Vector Generalized Additive Models

Description

Fit a vector generalized additive model (VGAM). This is a large class of models that includes generalized additive models (GAMs) and vector generalized linear models (VGLMs) as special cases.

Usage

vgam(formula, family, data = list(), weights = NULL, subset = NULL, 
     na.action = na.fail, etastart = NULL, mustart = NULL, 
     coefstart = NULL, control = vgam.control(...), offset = NULL, 
     method = "vgam.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE, 
     contrasts = NULL, constraints = NULL, 
     extra = list(), qr.arg = FALSE, smart = TRUE, ...)

Arguments

formula
a symbolic description of the model to be fit. The RHS of the formula is applied to each linear/additive predictor. Different variables in each linear/additive predictor can be chosen by specifying constraint matrices.
family
a function of class "vglmff" (see vglmff-class) describing what statistical model is to be fitted. This is called a ``VGAM family function''. See
data
an optional data frame containing the variables in the model. By default the variables are taken from environment(formula), typically the environment from which vgam is called.
weights
an optional vector or matrix of (prior) weights to be used in the fitting process. If weights is a matrix, then it must be in matrix-band form, whereby the first $M$ columns of the matrix are the diagonals, followed by the
subset
an optional logical vector specifying a subset of observations to be used in the fitting process.
na.action
a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is un
etastart, mustart, coefstart
Same as for vglm.
control
a list of parameters for controlling the fitting process. See vgam.control for details.
offset
a vector or $M$-column matrix of offset values. These are a priori known and are added to the linear/additive predictors during fitting.
method
the method to be used in fitting the model. The default (and presently only) method vgam.fit uses iteratively reweighted least squares (IRLS).
model
a logical value indicating whether the model frame should be assigned in the model slot.
x.arg, y.arg
logical values indicating whether the model matrix and response vector/matrix used in the fitting process should be assigned in the x and y slots. Note the model matrix is the LM model matrix; to get the VGAM model matrix
contrasts
an optional list. See the contrasts.arg of model.matrix.default.
constraints
an optional list of constraint matrices. The components of the list must be named with the term it corresponds to (and it must match in character format exactly). Each constraint matrix must have $M$ rows, and be of full-column rank. By default,
extra
an optional list with any extra information that might be needed by the VGAM family function.
qr.arg
logical value indicating whether the slot qr, which returns the QR decomposition of the VLM model matrix, is returned on the object.
smart
logical value indicating whether smart prediction (smartpred) will be used.
...
further arguments passed into vgam.control.

Value

  • An object of class "vgam" (see vgam-class for further information).

Details

A vector generalized additive model (VGAM) is loosely defined as a statistical model that is a function of $M$ additive predictors. The central formula is given by $$\eta_j = \sum_{k=1}^p f_{(j)k}(x_k)$$ where $x_k$ is the $k$th explanatory variable (almost always $x_1=1$ for the intercept term), and $f_{(j)k}$ are smooth functions of $x_k$ that are estimated by smoothers. The first term in the summation is just the intercept. Currently only one type of smoother is implemented and this is called a vector (cubic smoothing spline) smoother. Here, $j=1,\ldots,M$ where $M$ is finite. If all the functions are constrained to be linear then the resulting model is a vector generalized linear model (VGLM). VGLMs are best fitted with vglm.

Vector (cubic smoothing spline) smoothers are represented by s() (see s). Local regression via lo() is not supported. The results of vgam will differ from the S-PLUS and Rgam function (in the gam Rpackage) because vgam uses a different knot selection algorithm. In general, fewer knots are chosen because the computation becomes expensive when the number of additive predictors $M$ is large.

The underlying algorithm of VGAMs is iteratively reweighted least squares (IRLS) and modified vector backfitting using vector splines. B-splines are used as the basis functions for the vector (smoothing) splines. vgam.fit is the function that actually does the work. The smoothing code is based on F. O'Sullivan's BART code. A closely related methodology based on VGAMs called constrained additive ordination (CAO) first forms a linear combination of the explanatory variables (called latent variables) and then fits a GAM to these. This is implemented in the function cao for a very limited choice of family functions.

References

Yee, T. W. and Wild, C. J. (1996) Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481--493.

Yee, T. W. (2008) The VGAM Package. R News, 8, 28--39.

Documentation accompanying the VGAM package at http://www.stat.auckland.ac.nz/~yee contains further information and examples.

See Also

vgam.control, vgam-class, vglmff-class, plotvgam, vglm, s, vsmooth.spline, cao.

Examples

Run this code
# Nonparametric proportional odds model 
pneumo = transform(pneumo, let=log(exposure.time))
vgam(cbind(normal,mild,severe) ~ s(let), cumulative(par=TRUE), pneumo)

# Nonparametric logistic regression 
fit = vgam(agaaus ~ s(altitude, df=2), binomialff, hunua)
plot(fit, se=TRUE)

# Fit two species simultaneously 
fit2 = vgam(cbind(agaaus, kniexc) ~ s(altitude, df=c(2,3)),
            binomialff(mv=TRUE), hunua)
coef(fit2, mat=TRUE)   # Not really interpretable 
plot(fit2, se=TRUE, overlay=TRUE, lcol=1:2, scol=1:2)

ooo = with(hunua, order(altitude))
with(hunua, matplot(altitude[ooo], fitted(fit2)[ooo,], type="l", lwd=2,
     xlab="Altitude (m)", ylab="Probability of presence", las=1,
     main="Two plant species' response curves", ylim=c(0,.8)))
with(hunua, rug(altitude))

Run the code above in your browser using DataLab