Learn R Programming

VGAM (version 1.1-14)

vgam: Fitting Vector Generalized Additive Models

Description

Fit a vector generalized additive model (VGAM). Both 1st-generation VGAMs (based on backfitting) and 2nd-generation VGAMs (based on P-splines, with automatic smoothing parameter selection) are implemented. This is a large class of models that includes generalized additive models (GAMs) and vector generalized linear models (VGLMs) as special cases.

Usage

vgam(formula,
     family = stop("argument 'family' needs to be assigned"),
     data = list(), weights = NULL, subset = NULL,
     na.action, etastart = NULL, mustart = NULL,
     coefstart = NULL, control = vgam.control(...),
     offset = NULL, method = "vgam.fit", model = FALSE,
     x.arg = TRUE, y.arg = TRUE, contrasts = NULL,
     constraints = NULL, extra = list(), form2 = NULL,
     qr.arg = FALSE, smart = TRUE, ...)

Arguments

Value

For G1-VGAMs and G2-VGAMs, an object of class

"vgam" or

"pvgam"

respectively (see vgam-class

and pvgam-class

for further information).

Details

A vector generalized additive model (VGAM) is loosely defined as a statistical model that is a function of \(M\) additive predictors. The central formula is given by $$\eta_j = \sum_{k=1}^p f_{(j)k}(x_k)$$ where \(x_k\) is the \(k\)th explanatory variable (almost always \(x_1=1\) for the intercept term), and \(f_{(j)k}\) are smooth functions of \(x_k\) that are estimated by smoothers. The first term in the summation is just the intercept. Currently two types of smoothers are implemented: s represents the older and more traditional one, called a vector (cubic smoothing spline) smoother and is based on Yee and Wild (1996); it is more similar to the R package gam. The newer one is represented by sm.os and sm.ps, and these are based on O-splines and P-splines---they allow automatic smoothing parameter selection; it is more similar to the R package mgcv.

In the above, \(j=1,\ldots,M\) where \(M\) is finite. If all the functions are constrained to be linear then the resulting model is a vector generalized linear model (VGLM). VGLMs are best fitted with vglm.

Vector (cubic smoothing spline) smoothers are represented by s() (see s). Local regression via lo() is not supported. The results of vgam will differ from the gam() (in the gam) because vgam() uses a different knot selection algorithm. In general, fewer knots are chosen because the computation becomes expensive when the number of additive predictors \(M\) is large.

Second-generation VGAMs are based on the O-splines and P-splines. The latter is due to Eilers and Marx (1996). Backfitting is not required, and estimation is performed using IRLS. The function sm.os represents a smart implementation of O-splines. The function sm.ps represents a smart implementation of P-splines. Written G2-VGAMs or P-VGAMs, this methodology should not be used unless the sample size is reasonably large. Usually an UBRE predictive criterion is optimized (at each IRLS iteration) because the scale parameter for VGAMs is usually assumed to be known. This search for optimal smoothing parameters does not always converge, and neither is it totally reliable. G2-VGAMs implicitly set criterion = "coefficients" so that convergence occurs when the change in the regression coefficients between 2 IRLS iterations is sufficiently small. Otherwise the search for the optimal smoothing parameters might cause the log-likelihood to decrease between 2 IRLS iterations. Currently outer iteration is implemented, by default, rather than performance iteration because the latter is more easy to converge to a local solution; see Wood (2004) for details. One can use performance iteration by setting Maxit.outer = 1 in vgam.control.

The underlying algorithm of VGAMs is IRLS. First-generation VGAMs (called G1-VGAMs) are estimated by modified vector backfitting using vector splines. O-splines are used as the basis functions for the vector (smoothing) splines, which are a lower dimensional version of natural B-splines. The function vgam.fit() actually does the work. The smoothing code is based on F. O'Sullivan's BART code.

A closely related methodology based on VGAMs called constrained additive ordination (CAO) first forms a linear combination of the explanatory variables (called latent variables) and then fits a GAM to these. This is implemented in the function cao for a very limited choice of family functions.

References

Wood, S. N. (2004). Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Assoc., 99(467): 673--686.

Yee, T. W. and Wild, C. J. (1996). Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481--493.

Yee, T. W. (2008). The VGAM Package. R News, 8, 28--39.

Yee, T. W. (2015). Vector Generalized Linear and Additive Models: With an Implementation in R. New York, USA: Springer.

Yee, T. W. (2016). Comments on ``Smoothing parameter and model selection for general smooth models'' by Wood, S. N. and Pya, N. and Safken, N., J. Amer. Statist. Assoc., 110(516).

See Also

is.buggy, vgam.control, vgam-class, vglmff-class, plotvgam, summaryvgam, summarypvgam, sm.os, sm.ps, s, magic, vglm, vsmooth.spline, cao.

Examples

Run this code
# Nonparametric proportional odds model
pneumo <- transform(pneumo, let = log(exposure.time))
vgam(cbind(normal, mild, severe) ~ s(let),
     cumulative(parallel = TRUE), data = pneumo, trace = TRUE)

# Nonparametric logistic regression
hfit <- vgam(agaaus ~ s(altitude, df = 2), binomialff, hunua)
if (FALSE)  plot(hfit, se = TRUE) 
phfit <- predict(hfit, type = "terms", raw = TRUE, se = TRUE)
names(phfit)
head(phfit$fitted)
head(phfit$se.fit)
phfit$df
phfit$sigma

if (FALSE)    # Fit two species simultaneously
hfit2 <- vgam(cbind(agaaus, kniexc) ~ s(altitude, df = c(2, 3)),
              binomialff(multiple.responses = TRUE), data = hunua)
coef(hfit2, matrix = TRUE)  # Not really interpretable
plot(hfit2, se = TRUE, overlay = TRUE, lcol = 3:4, scol = 3:4)
ooo <- with(hunua, order(altitude))
with(hunua, matplot(altitude[ooo], fitted(hfit2)[ooo,],
      ylim = c(0, 0.8), las = 1,type = "l", lwd = 2,
     xlab = "Altitude (m)", ylab = "Probability of presence",
     main = "Two plant species' response curves"))
with(hunua, rug(altitude))

# The 'subset' argument does not work here. Use subset() instead.
set.seed(1)
zdata <- data.frame(x2 = runif(nn <- 500))
zdata <- transform(zdata, y = rbinom(nn, 1, 0.5))
zdata <- transform(zdata, subS = runif(nn) < 0.7)
sub.zdata <- subset(zdata, subS)  # Use this instead
if (FALSE)
  fit4a <- vgam(cbind(y, y) ~ s(x2, df = 2),
                binomialff(multiple.responses = TRUE),
                data = zdata, subset = subS)  # This fails!!!
fit4b <- vgam(cbind(y, y) ~ s(x2, df = 2),
              binomialff(multiple.responses = TRUE),
              data = sub.zdata)  # This succeeds!!!
fit4c <- vgam(cbind(y, y) ~ sm.os(x2),
              binomialff(multiple.responses = TRUE),
              data = sub.zdata)  # This succeeds!!!
par(mfrow = c(2, 2))
plot(fit4b, se = TRUE, shade = TRUE, shcol = "pink")
plot(fit4c, se = TRUE, shade = TRUE, shcol = "pink")

Run the code above in your browser using DataLab