A constrained additive ordination (CAO) model is fitted using the
*reduced-rank vector generalized additive model* (RR-VGAM)
framework.

```
cao(formula, family = stop("argument 'family' needs to be assigned"),
data = list(),
weights = NULL, subset = NULL, na.action = na.fail,
etastart = NULL, mustart = NULL, coefstart = NULL,
control = cao.control(...), offset = NULL,
method = "cao.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE,
contrasts = NULL, constraints = NULL,
extra = NULL, qr.arg = FALSE, smart = TRUE, ...)
```

formula

a symbolic description of the model to be fit. The RHS of the
formula is used to construct the latent variables, upon which the
smooths are applied. All the variables in the formula are used
for the construction of latent variables except for those specified
by the argument `noRRR`

, which is itself a formula. The LHS
of the formula contains the response variables, which should be a
matrix with each column being a response (species).

family

a function of class `"vglmff"`

(see `vglmff-class`

)
describing what statistical model is to be fitted. This is called a
``VGAM family function''. See `CommonVGAMffArguments`

for general information about many types of arguments found in this
type of function.
See `cqo`

for a list of those presently implemented.

data

an optional data frame containing the variables in the model.
By default the variables are taken from `environment(formula)`

,
typically the environment from which `cao`

is called.

weights

an optional vector or matrix of (prior) weights to be used in the
fitting process. For `cao`

, this argument currently should
not be used.

subset

an optional logical vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain
`NA`

s. The default is set by the `na.action`

setting of
`options`

, and is `na.fail`

if that is unset.
The ``factory-fresh'' default is `na.omit`

.

etastart

starting values for the linear predictors. It is a \(M\)-column
matrix. If \(M=1\) then it may be a vector. For `cao`

,
this argument currently should not be used.

mustart

starting values for the fitted values. It can be a vector or a
matrix. Some family functions do not make use of this argument.
For `cao`

, this argument currently should not be used.

coefstart

starting values for the coefficient vector. For `cao`

, this
argument currently should not be used.

control

a list of parameters for controlling the fitting process.
See `cao.control`

for details.

offset

a vector or \(M\)-column matrix of offset values. These are
*a priori* known and are added to the linear predictors during
fitting. For `cao`

, this argument currently should not be used.

method

the method to be used in fitting the model. The default
(and presently only) method `cao.fit`

uses iteratively
reweighted least squares (IRLS) within FORTRAN code called from
`optim`

.

model

a logical value indicating whether the *model frame* should be
assigned in the `model`

slot.

x.arg, y.arg

logical values indicating whether the model matrix and response
vector/matrix used in the fitting process should be assigned in the
`x`

and `y`

slots. Note the model matrix is the linear
model (LM) matrix.

contrasts

an optional list. See the `contrasts.arg`

of
`model.matrix.default`

.

constraints

an optional list of constraint matrices. For `cao`

, this
argument currently should not be used. The components of the list
must be named with the term it corresponds to (and it must match
in character format). Each constraint matrix must have \(M\)
rows, and be of full-column rank. By default, constraint matrices
are the \(M\) by \(M\) identity matrix unless arguments in the
family function itself override these values. If `constraints`

is used it must contain *all* the terms; an incomplete list is
not accepted.

extra

an optional list with any extra information that might be needed by
the family function. For `cao`

, this argument currently should
not be used.

qr.arg

For `cao`

, this argument currently should not be used.

smart

logical value indicating whether smart prediction
(`smartpred`

) will be used.

…

further arguments passed into `cao.control`

.

An object of class `"cao"`

(this may change to `"rrvgam"`

in the future).
Several generic functions can be applied to the object, e.g.,
`Coef`

, `concoef`

, `lvplot`

,
`summary`

.

CAO is very costly to compute. With version 0.7-8 it took 28 minutes on a fast machine. I hope to look at ways of speeding things up in the future.

Use `set.seed`

just prior to calling
`cao()`

to make your results reproducible.
The reason for this is finding the optimal
CAO model presents a difficult optimization problem, partly because the
log-likelihood function contains many local solutions. To obtain the
(global) solution the user is advised to try *many* initial values.
This can be done by setting `Bestof`

some appropriate value
(see `cao.control`

). Trying many initial values becomes
progressively more important as the nonlinear degrees of freedom of
the smooths increase.

The arguments of `cao`

are a mixture of those from
`vgam`

and `cqo`

, but with some extras
in `cao.control`

. Currently, not all of the
arguments work properly.

CAO can be loosely be thought of as the result of fitting generalized
additive models (GAMs) to several responses (e.g., species) against
a very small number of latent variables. Each latent variable is a
linear combination of the explanatory variables; the coefficients
**C** (called \(C\) below) are called *constrained
coefficients* or *canonical coefficients*, and are interpreted as
weights or loadings. The **C** are estimated by maximum likelihood
estimation. It is often a good idea to apply `scale`

to each explanatory variable first.

For each response (e.g., species), each latent variable is smoothed
by a cubic smoothing spline, thus CAO is data-driven. If each smooth
were a quadratic then CAO would simplify to *constrained quadratic
ordination* (CQO; formerly called *canonical Gaussian ordination*
or CGO).
If each smooth were linear then CAO would simplify to *constrained
linear ordination* (CLO). CLO can theoretically be fitted with
`cao`

by specifying `df1.nl=0`

, however it is more efficient
to use `rrvglm`

.

Currently, only `Rank=1`

is implemented, and only
`noRRR = ~1`

models are handled.

With binomial data, the default formula is
$$logit(P[Y_s=1]) = \eta_s = f_s(\nu), \ \ \ s=1,2,\ldots,S$$
where \(x_2\) is a vector of environmental variables, and
\(\nu=C^T x_2\) is a \(R\)-vector of latent variables.
The \(\eta_s\) is an additive predictor for species \(s\),
and it models the probabilities of presence as an additive model on
the logit scale. The matrix \(C\) is estimated from the data, as
well as the smooth functions \(f_s\). The argument ```
noRRR = ~
1
```

specifies that the vector \(x_1\), defined for RR-VGLMs
and QRR-VGLMs, is simply a 1 for an intercept.
Here, the intercept in the model is absorbed into the functions.
A `clogloglink`

link may be preferable over a
`logitlink`

link.

With Poisson count data, the formula is $$\log(E[Y_s]) = \eta_s = f_s(\nu)$$ which models the mean response as an additive models on the log scale.

The fitted latent variables (site scores) are scaled to have
unit variance. The concept of a tolerance is undefined for
CAO models, but the optimums and maximums are defined. The generic
functions `Max`

and `Opt`

should work for
CAO objects, but note that if the maximum occurs at the boundary then
`Max`

will return a `NA`

. Inference for CAO models
is currently undeveloped.

Yee, T. W. (2006).
Constrained additive ordination.
*Ecology*, **87**, 203--213.

`cao.control`

,
`Coef.cao`

,
`cqo`

,
`latvar`

,
`Opt`

,
`Max`

,
`calibrate.qrrvglm`

,
`persp.cao`

,
`poissonff`

,
`binomialff`

,
`negbinomial`

,
`gamma2`

,
`set.seed`

,
`gam()`

in gam,
`trapO`

.

```
# NOT RUN {
hspider[, 1:6] <- scale(hspider[, 1:6]) # Standardized environmental vars
set.seed(149) # For reproducible results
ap1 <- cao(cbind(Pardlugu, Pardmont, Pardnigr, Pardpull) ~
WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux,
family = poissonff, data = hspider, Rank = 1,
df1.nl = c(Pardpull= 2.7, 2.5),
Bestof = 7, Crow1positive = FALSE)
sort(deviance(ap1, history = TRUE)) # A history of all the iterations
Coef(ap1)
concoef(ap1)
par(mfrow = c(2, 2))
plot(ap1) # All the curves are unimodal; some quite symmetric
par(mfrow = c(1, 1), las = 1)
index <- 1:ncol(depvar(ap1))
lvplot(ap1, lcol = index, pcol = index, y = TRUE)
trplot(ap1, label = TRUE, col = index)
abline(a = 0, b = 1, lty = 2)
trplot(ap1, label = TRUE, col = "blue", log = "xy", which.sp = c(1, 3))
abline(a = 0, b = 1, lty = 2)
persp(ap1, col = index, lwd = 2, label = TRUE)
abline(v = Opt(ap1), lty = 2, col = index)
abline(h = Max(ap1), lty = 2, col = index)
# }
```

Run the code above in your browser using DataCamp Workspace