Last chance! 50% off unlimited learning
Sale ends in
cqo(formula, family, data = list(), weights = NULL, subset = NULL,
na.action = na.fail, etastart = NULL, mustart = NULL,
coefstart = NULL, control = qrrvglm.control(...), offset = NULL,
method = "cqo.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE,
contrasts = NULL, constraints = NULL, extra = NULL,
smart = TRUE, ...)
"vglmff"
(see vglmff-class
)
describing what statistical model is to be fitted. This is called a
``
environment(formula)
,
typically the environment from which cqo
is called.NA
s. The default is set by the na.action
setting of
options
, and is na.fail
if that is unsqrrvglm.control
for details.cqo.fit
uses iteratively reweighted least squares (IRLS).model
slot.x
and y
slots.
Note the model matrix is the LM model matrix.contrasts.arg
of model.matrix.default
.smartpred
) will be used.qrrvglm.control
."qrrvglm"
.
Note that the slot misc
has a list component called
deviance.Bestof
which gives the history of deviances over all
the iterations.Bestof
in qrrvglm.control
.
For reproducibility of the results, it pays to set a different
random number seed before calling cqo
(the function
set.seed
does this). The function cqo
chooses initial values for C using .Init.Poisson.QO()
if Use.Init.Poisson.QO = TRUE
, else random numbers.
Unless ITolerances = TRUE
or EqualTolerances = FALSE
,
CQO is computationally expensive with memory and time.
It pays to keep the rank down to 1
or 2. If EqualTolerances = TRUE
and ITolerances = FALSE
then
the cost grows quickly with the number of species and sites (in terms of
memory requirements and time). The data needs to conform quite closely
to the statistical model, and the environmental range of the data should
be wide in order for the quadratics to fit the data well (bell-shaped
response surfaces). If not, RR-VGLMs will be more appropriate because
the response is linear on the transformed scale (e.g., log or logit)
and the ordination is called constrained linear ordination or CLO.
Like many regression models, CQO is sensitive to outliers (in the environmental and species data), sparse data, high leverage points, multicollinearity etc. For these reasons, it is necessary to examine the data carefully for these features and take corrective action (e.g., omitting certain species, sites, environmental variables from the analysis, transforming certain environmental variables, etc.). Any optimum lying outside the convex hull of the site scores should not be trusted. Fitting a CAO is recommended first, then upon transformations etc., possibly a CQO can be fitted.
For binary data, it is necessary to have `enough' data. In general,
the number of sites $n$ ought to be much larger than the number of
species S, e.g., at least 100 sites for two species. Compared
to count (Poisson) data, numerical problems occur more frequently
with presence/absence (binary) data. For example, if Rank = 1
and if the response data for each species is a string of all absences,
then all presences, then all absences (when enumerated along the latent
variable) then infinite parameter estimates will occur. In general,
setting ITolerances = TRUE
may help.
This function was formerly called cgo
. It has been renamed to
reinforce a new nomenclature described in Yee (2006).
latvar
for $R=1$
else latvar1
, latvar2
, etc. in the output). Here, $R$
is the rank or the number of ordination axes. Each species'
response is then a regression of these latent variables using quadratic
polynomials on a transformed scale (e.g., log for Poisson counts, logit
for presence/absence responses). The solution is obtained iteratively
in order to maximize the log-likelihood function, or equivalently,
minimize the deviance.
The central formula (for Poisson and binomial species data) is
given by
qrrvglm.control
, e.g., the argument noRRR
specifies which variables comprise $x_1$.
Theoretically, the four most popular cqo
correspond to the Poisson, binomial,
normal, and negative binomial distributions. The latter is a
2-parameter model. All of these are implemented, as well as the
2-parameter gamma. The Poisson is or should be catered for by
quasipoissonff
and poissonff
, and the
binomial by quasibinomialff
and binomialff
.
Those beginning with "quasi"
have dispersion parameters that
are estimated for each species.
For initial values, the function .Init.Poisson.QO
should
work reasonably well if the data is Poisson with species having equal
tolerances. It can be quite good on binary data too. Otherwise the
Cinit
argument in qrrvglm.control
can be used.
It is possible to relax the quadratic form to an additive model. The
result is a data-driven approach rather than a model-driven approach,
so that CQO is extended to constrained additive ordination
(CAO) when $R=1$. See cao
for more details.
In this documentation, $M$ is the number of linear predictors, $S$ is the number of responses (species). Then $M=S$ for Poisson and binomial species data, and $M=2S$ for negative binomial and gamma distributed species data.
ter Braak, C. J. F. and Prentice, I. C. (1988) A theory of gradient analysis. Advances in Ecological Research, 18, 271--317.
Yee, T. W. (2006) Constrained additive ordination. Ecology, 87, 203--213.
qrrvglm.control
,
Coef.qrrvglm
,
predictqrrvglm
,
rcqo
,
cao
, rrvglm
, poissonff
,
binomialff
,
negbinomial
,
gamma2
,
lvplot.qrrvglm
,
perspqrrvglm
,
trplot.qrrvglm
, vglm
,
set.seed
,
hspider
.
Documentation accompanying the
# Example 1; Fit an unequal tolerances model to the hunting spiders data
hspider[,1:6] <- scale(hspider[,1:6]) # Standardized environmental variables
set.seed(1234) # For reproducibility of the results
p1ut <- cqo(cbind(Alopacce, Alopcune, Alopfabr, Arctlute, Arctperi,
Auloalbi, Pardlugu, Pardmont, Pardnigr, Pardpull,
Trocterr, Zoraspin) ~
WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux,
fam = poissonff, data = hspider, Crow1positive = FALSE,
EqualTol = FALSE)
sort(p1ut@misc$deviance.Bestof) # A history of all the iterations
if(deviance(p1ut) > 1177) warning("suboptimal fit obtained")
S <- ncol(depvar(p1ut)) # Number of species
clr <- (1:(S+1))[-7] # Omits yellow
lvplot(p1ut, y = TRUE, lcol = clr, pch = 1:S, pcol = clr, las = 1) # Ordination diagram
legend("topright", leg = colnames(depvar(p1ut)), col = clr,
pch = 1:S, merge = TRUE, bty = "n", lty = 1:S, lwd = 2)
(cp <- Coef(p1ut))
(a <- cp@latvar[cp@latvar.order]) # The ordered site scores along the gradient
# Names of the ordered sites along the gradient:
rownames(cp@latvar)[cp@latvar.order]
(aa <- (cp@Optimum)[,cp@Optimum.order]) # The ordered optima along the gradient
aa <- aa[!is.na(aa)] # Delete the species that is not unimodal
names(aa) # Names of the ordered optima along the gradient
trplot(p1ut, which.species = 1:3, log = "xy", type = "b", lty = 1, lwd = 2,
col = c("blue","red","green"), label = TRUE) -> ii # Trajectory plot
legend(0.00005, 0.3, paste(ii$species[, 1], ii$species[, 2], sep = " and "),
lwd = 2, lty = 1, col = c("blue", "red", "green"))
abline(a = 0, b = 1, lty = "dashed")
S <- ncol(depvar(p1ut)) # Number of species
clr <- (1:(S+1))[-7] # Omits yellow
persp(p1ut, col = clr, label = TRUE, las = 1) # Perspective plot
# Example 2; Fit an equal tolerances model. Less numerically fraught.
set.seed(1234)
p1et <- cqo(cbind(Alopacce, Alopcune, Alopfabr, Arctlute, Arctperi,
Auloalbi, Pardlugu, Pardmont, Pardnigr, Pardpull,
Trocterr, Zoraspin) ~
WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux,
poissonff, data = hspider, Crow1positive = FALSE)
sort(p1et@misc$deviance.Bestof) # A history of all the iterations
if (deviance(p1et) > 1586) warning("suboptimal fit obtained")
S <- ncol(depvar(p1et)) # Number of species
clr <- (1:(S+1))[-7] # Omits yellow
persp(p1et, col = clr, label = TRUE, las = 1)
# Example 3: A rank-2 equal tolerances CQO model with Poisson data
# This example is numerically fraught... need IToler = TRUE too.
set.seed(555)
p2 <- cqo(cbind(Alopacce, Alopcune, Alopfabr, Arctlute, Arctperi,
Auloalbi, Pardlugu, Pardmont, Pardnigr, Pardpull,
Trocterr, Zoraspin) ~
WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux,
poissonff, data = hspider, Crow1positive = FALSE,
IToler = TRUE, Rank = 2, Bestof = 3, isd.latvar = c(2.1, 0.9))
sort(p2@misc$deviance.Bestof) # A history of all the iterations
if(deviance(p2) > 1127) warning("suboptimal fit obtained")
lvplot(p2, ellips = FALSE, label = TRUE, xlim = c(-3,4),
C = TRUE, Ccol = "brown", sites = TRUE, scol = "grey",
pcol = "blue", pch = "+", chull = TRUE, ccol = "grey")
# Example 4: species packing model with presence/absence data
set.seed(2345)
n <- 200; p <- 5; S <- 5
mydata <- rcqo(n, p, S, fam = "binomial", hi.abundance = 4,
eq.tol = TRUE, es.opt = TRUE, eq.max = TRUE)
myform <- attr(mydata, "formula")
set.seed(1234)
b1et <- cqo(myform, binomialff(mv = TRUE, link = "cloglog"), data = mydata)
sort(b1et@misc$deviance.Bestof) # A history of all the iterations
lvplot(b1et, y = TRUE, lcol = 1:S, pch = 1:S, pcol = 1:S, las = 1)
Coef(b1et)
# Compare the fitted model with the 'truth'
cbind(truth = attr(mydata, "concoefficients"), fitted = concoef(b1et))
# Example 5: Plot the deviance residuals for diagnostic purposes
set.seed(1234)
p1et <- cqo(cbind(Alopacce, Alopcune, Alopfabr, Arctlute, Arctperi,
Auloalbi, Pardlugu, Pardmont, Pardnigr, Pardpull,
Trocterr, Zoraspin) ~
WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux,
poissonff, data = hspider, EqualTol = TRUE, trace = FALSE)
sort(p1et@misc$deviance.Bestof) # A history of all the iterations
if(deviance(p1et) > 1586) warning("suboptimal fit obtained")
S <- ncol(depvar(p1et))
par(mfrow = c(3, 4))
for (ii in 1:S) {
tempdata <- data.frame(latvar1 = c(latvar(p1et)), sppCounts = depvar(p1et)[, ii])
tempdata <- transform(tempdata, myOffset = -0.5 * latvar1^2)
# For species ii, refit the model to get the deviance residuals
fit1 <- vglm(sppCounts ~ offset(myOffset) + latvar1, poissonff,
data = tempdata, trace = FALSE)
# For checking: this should be 0
# print("max(abs(c(Coef(p1et)@B1[1,ii], Coef(p1et)@A[ii,1]) - coef(fit1)))")
# print( max(abs(c(Coef(p1et)@B1[1,ii], Coef(p1et)@A[ii,1]) - coef(fit1))) )
# Plot the deviance residuals
devresid <- resid(fit1, type = "deviance")
predvalues <- predict(fit1) + fit1@offset
ooo <- with(tempdata, order(latvar1))
with(tempdata, plot(latvar1, predvalues + devresid, col = "darkgreen",
xlab = "latvar1", ylab = "", main = colnames(depvar(p1et))[ii]))
with(tempdata, lines(latvar1[ooo], predvalues[ooo], col = "blue"))
}
Run the code above in your browser using DataLab