cao: Fitting Constrained Additive Ordination (CAO)

Description

A constrained additive ordination (CAO) model is fitted using the reduced-rank vector generalized additive model (RR-VGAM) framework.

Usage

cao(formula, family, data = list(),
    weights = NULL, subset = NULL, na.action = na.fail,
    etastart = NULL, mustart = NULL, coefstart = NULL,
    control = cao.control(...), offset = NULL,
    method = "cao.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE,
    contrasts = NULL, constraints = NULL,
    extra = NULL, qr.arg = FALSE, smart = TRUE, ...)

Arguments

Value

An object of class "cao" (this may change to "rrvgam" in the future). Several generic functions can be applied to the object, e.g., Coef, ccoef, lvplot, summary.

Warning

CAO is very costly to compute. With version 0.7-8 it took 28 minutes on a fast machine. I hope to look at ways of speeding things up in the future.

Use set.seed just prior to calling cao() to make your results reproducible. The reason for this is finding the optimal CAO model presents a difficult optimization problem, partly because the log-likelihood function contains many local solutions. To obtain the (global) solution the user is advised to try many initial values. This can be done by setting Bestof some appropriate value (see cao.control). Trying many initial values becomes progressively more important as the nonlinear degrees of freedom of the smooths increase.

Currently the dispersion parameter for a gaussianff CAO model is estimated slightly differently and may be slightly biassed downwards (usually a little too small).

Details

The arguments of cao are a mixture of those from vgam and cqo, but with some extras in cao.control. Currently, not all of the arguments work properly.

CAO can be loosely be thought of as the result of fitting generalized additive models (GAMs) to several responses (e.g., species) against a very small number of latent variables. Each latent variable is a linear combination of the explanatory variables; the coefficients C (called $C$ below) are called constrained coefficients or canonical coefficients, and are interpreted as weights or loadings. The C are estimated by maximum likelihood estimation. It is often a good idea to apply scale to each explanatory variable first.

For each response (e.g., species), each latent variable is smoothed by a cubic smoothing spline, thus CAO is data-driven. If each smooth were a quadratic then CAO would simplify to constrained quadratic ordination (CQO; formerly called canonical Gaussian ordination or CGO). If each smooth were linear then CAO would simplify to constrained linear ordination (CLO). CLO can theoretically be fitted with cao by specifying df1.nl=0, however it is more efficient to use rrvglm.

Currently, only Rank=1 is implemented, and only Norrr = ~1 models are handled.

With binomial data, the default formula is $$logit(P[Y_s=1]) = \eta_s = f_s(\nu), \ \ \ s=1,2,\ldots,S$$ where $x_2$ is a vector of environmental variables, and $\nu=C^T x_2$ is a $R$-vector of latent variables. The $\eta_s$ is an additive predictor for species $s$, and it models the probabilities of presence as an additive model on the logit scale. The matrix $C$ is estimated from the data, as well as the smooth functions $f_s$. The argument Norrr = ~ 1 specifies that the vector $x_1$, defined for RR-VGLMs and QRR-VGLMs, is simply a 1 for an intercept. Here, the intercept in the model is absorbed into the functions. A cloglog link may be preferable over a logit link.

With Poisson count data, the formula is $$\log(E[Y_s]) = \eta_s = f_s(\nu)$$ which models the mean response as an additive models on the log scale.

The fitted latent variables (site scores) are scaled to have unit variance. The concept of a tolerance is undefined for CAO models, but the optima and maxima are defined. The generic functions Max and Opt should work for CAO objects, but note that if the maximum occurs at the boundary then Max will return a NA. Inference for CAO models is currently undeveloped.

References

Yee, T. W. (2006) Constrained additive ordination. Ecology, 87, 203--213.

Documentation accompanying the VGAM package at http://www.stat.auckland.ac.nz/~yee contains further information and examples.

Examples

Run this code

hspider[,1:6] = scale(hspider[,1:6]) # Standardized environmental vars
set.seed(149) # For reproducible results 
ap1 = cao(cbind(Pardlugu, Pardmont, Pardnigr, Pardpull) ~
          WaterCon + BareSand + FallTwig +
          CoveMoss + CoveHerb + ReflLux,
          family = poissonff, data = hspider, Rank = 1,
          df1.nl = c(Pardpull=2.7, 2.5),
          Bestof = 7, Crow1positive = FALSE)
sort(ap1@misc$deviance.Bestof) # A history of all the iterations

Coef(ap1)
ccoef(ap1)

par(mfrow=c(2,2))
plot(ap1)   # All the curves are unimodal; some quite symmetric

par(mfrow=c(1,1), las=1)
index = 1:ncol(ap1@y)
lvplot(ap1, lcol=index, pcol=index, y=TRUE)

trplot(ap1, label=TRUE, col=index)
abline(a=0, b=1, lty=2)

trplot(ap1, label=TRUE, col="blue", log="xy", whichSp=c(1,3))
abline(a=0, b=1, lty=2)

persp(ap1, col=index, lwd=2, label=TRUE)
abline(v=Opt(ap1), lty=2, col=index)
abline(h=Max(ap1), lty=2, col=index)

Run the code above in your browser using DataLab