Random generation for constrained quadratic ordination (CQO).
rcqo(n, p, S, Rank = 1,
     family = c("poisson", "negbinomial", "binomial-poisson",
                "Binomial-negbinomial", "ordinal-poisson",
                "Ordinal-negbinomial", "gamma2"),
     eq.maximums = FALSE, eq.tolerances = TRUE, es.optimums = FALSE,
     lo.abundance = if (eq.maximums) hi.abundance else 10,
     hi.abundance = 100, sd.latvar = head(1.5/2^(0:3), Rank),
     sd.optimums = ifelse(es.optimums, 1.5/Rank, 1) *
                       ifelse(scale.latvar, sd.latvar, 1),
     sd.tolerances = 0.25, Kvector = 1, Shape = 1,
     sqrt.arg = FALSE, log.arg = FALSE, rhox = 0.5, breaks = 4,
     seed = NULL, optimums1.arg = NULL, Crow1positive = TRUE,
     xmat = NULL, scale.latvar = TRUE)A \(n\) by \(p-1+S\) data frame with components and attributes. In the following the attributes are labelled with double quotes.
The environmental variables. This makes up the
    \(n\) by \(p-1\) \(X_2\) matrix.
    Note that x1 is not present; it is effectively a
    vector of ones since it corresponds to an intercept term when
    cqo is applied to the data.
The species data. This makes up the
    \(n\) by \(S\) matrix \(Y\).
    This will be of the form described by the argument
    family.
The \(p-1\) by \(R\) matrix of constrained coefficients (or canonical coefficients). These are also known as weights or loadings.
The formula involving the species and environmental
    variable names.
    This can be used directly in the formula argument
    of cqo.
The \(S\)-vector of species' maximums, on a log scale.
    These are uniformly distributed between
    log(lo.abundance) and log(hi.abundance).
The \(n\) by \(R\) matrix of site scores.
    Each successive column (latent variable) has
    sample standard deviation
    equal to successive values of sd.latvar.
The linear/additive predictor value.
The \(S\) by \(R\) matrix of species' optimums.
The \(S\) by \(R\) matrix of species' tolerances. These are the square root of the diagonal elements of the tolerance matrices (recall that all tolerance matrices are restricted to being diagonal in this function).
Other attributes are "break",
"family", "Rank",
"lo.abundance", "hi.abundance",
"eq.tolerances", "eq.maximums",
"seed" as used.
Number of sites. It is denoted by \(n\) below.
Number of environmental variables, including an intercept term. It is denoted by \(p\) below. Must be no less than \(1+R\) in value.
Number of species. It is denoted by \(S\) below.
The rank or the number of latent variables or true dimension of the data on the reduced space. This must be either 1, 2, 3 or 4. It is denoted by \(R\).
What type of species data is to be returned.
    The first choice is the default.
    If binomial then a 0 means absence and 1 means presence.
    If ordinal then the breaks argument is passed into
    the breaks argument of cut.
    Note that either the Poisson or
    negative binomial distributions
    are used to generate binomial and
    ordinal data, and that
    an upper-case choice is used for the
    negative binomial distribution
    (this makes it easier for the user).
    If "gamma2" then this is the
    2-parameter gamma distribution.
Logical. Does each species have the same maximum?
    See arguments lo.abundance and hi.abundance.
Logical. Does each species have the
    same tolerance? If TRUE then the common
    value is 1 along
    every latent variable, i.e., all species' tolerance matrices
    are the order-\(R\) identity matrix.
Logical. Do the species have equally spaced optimums?
    If TRUE then the quantity
    \(S^{1/R}\) must be an
    integer with value 2 or more. That is, there has to be an
    appropriate number of species in total.
    This is so that a grid
    of optimum values is possible in \(R\)-dimensional
    latent variable space
    in order to place the species' optimums.
    Also see the argument sd.tolerances.
Numeric. These are recycled to a vector of length \(S\).
    The species have a maximum
    between lo.abundance and hi.abundance. That is,
    at their optimal environment, the mean abundance of each
    species is between the two componentwise values.
    If eq.maximums is TRUE then
    lo.abundance and hi.abundance
    must have the same values.
    If eq.maximums is FALSE then the
    logarithm of the maximums are uniformly distributed between
    log(lo.abundance) and log(hi.abundance).
Numeric, of length \(R\) (recycled if necessary). Site scores along each latent variable have these standard deviation values. This must be a decreasing sequence of values because the first ordination axis contains the greatest spread of the species' site scores, followed by the second axis, followed by the third axis, etc.
Numeric, of length \(R\) (recycled if necessary).
    If es.optimums = FALSE then,
    for the \(r\)th latent variable axis,
    the optimums of the species are generated from a
    normal distribution centered about 0.
    If es.optimums = TRUE then the \(S\) optimums
    are equally spaced about 0 along every
    latent variable axis.
    Regardless of the value of es.optimums, the optimums
    are then scaled to give standard deviation
    sd.optimums[r].
Logical. If eq.tolerances = FALSE then, for the
    \(r\)th latent variable, the
    species' tolerances are
    chosen from a normal distribution with mean 1 and
    standard deviation
    sd.tolerances[r].
    However, the first species y1 has its tolerance matrix
    set equal to the order-\(R\) identity matrix.
    All tolerance matrices for all species are
    diagonal in this function.
    This argument is ignored if eq.tolerances is TRUE,
    otherwise it is recycled to length \(R\) if necessary.
A vector of positive \(k\) values (recycled to length \(S\)
    if necessary) for the negative binomial distribution
    (see negbinomial for details).
    Note that a natural default value does not exist,
    however the default
    value here is probably a realistic one, and that
    for large values
    of \(\mu\) one has \(Var(Y)=\mu^2/k\)
    approximately.
A vector of positive \(\lambda\)
    values (recycled to length
    \(S\) if necessary) for the 2-parameter gamma
    distribution (see
    gamma2 for details).
    Note that a natural default value
    does not exist, however the default value
    here is probably a realistic
    one, and that
    \(Var(Y) = \mu^2 / \lambda\).
Logical. Take the square-root of the
    negative binomial counts?
    Assigning sqrt.arg = TRUE
    when family="negbinomial" means
    that the resulting species data can be
    considered very crudely to be
    approximately Poisson distributed.
    They will not integers in general but much easier
    (less numerical
    problems) to estimate using something like
    cqo(..., family="poissonff").
Logical. Take the logarithm of the gamma random variates?
    Assigning log.arg = TRUE
    when family="gamma2" means
    that the resulting species data can be
    considered very crudely to be
    approximately Gaussian distributed about its (quadratic) mean.
Numeric, less than 1 in absolute value.
    The correlation between the environmental variables.
    The correlation matrix is a matrix of 1's along the diagonal
    and rhox in the off-diagonals.
    Note that each environmental variable is normally distributed
    with mean 0. The standard deviation of each environmental
    variable is chosen so that the site scores have the determined
    standard deviation, as given by argument sd.latvar.
If family is assigned an ordinal value then this argument
    is used to define the cutpoints. It is fed into the
    breaks argument of cut.
If given, it is passed into set.seed.
    This argument can be used to obtain reproducible results.
    If set, the value is saved as the "seed"
    attribute of the returned value. The default will
    not change the random generator state, and return
    .Random.seed
    as "seed" attribute.
If assigned and Rank = 1 then these are
    the explicity optimums.
    Recycled to length S.
See qrrvglm.control for details.
The \(n\) by \(p-1\) environmental matrix can be inputted.
Logical. If FALSE the argument
   sd.latvar is ignored and no scaling of the latent variable
   values is performed.
T. W. Yee
This function generates data coming from a
  constrained quadratic
  ordination (CQO) model. In particular,
  data coming from a species packing model
  can be generated
  with this function.
  The species packing model states that species have equal
  tolerances, equal maximums, and optimums which are uniformly
  distributed over the latent variable space. This can be
  achieved by assigning the arguments es.optimums = TRUE,
  eq.maximums = TRUE, eq.tolerances = TRUE.
At present, the Poisson and negative binomial abundances
  are generated first using lo.abundance and
  hi.abundance, and if family is binomial or ordinal
  then it is converted into these forms.
In CQO theory the \(n\) by \(p\) matrix \(X\) is partitioned into two parts \(X_1\) and \(X_2\). The matrix \(X_2\) contains the `real' environmental variables whereas the variables in \(X_1\) are just for adjustment purposes; they contain the intercept terms and other variables that one wants to adjust for when (primarily) looking at the variables in \(X_2\). This function has \(X_1\) only being a matrix of ones, i.e., containing an intercept only.
Yee, T. W. (2004). A new technique for maximum-likelihood canonical Gaussian ordination. Ecological Monographs, 74, 685--701.
Yee, T. W. (2006). Constrained additive ordination. Ecology, 87, 203--213.
ter Braak, C. J. F. and Prentice, I. C. (1988). A theory of gradient analysis. Advances in Ecological Research, 18, 271--317.
cqo,
  qrrvglm.control,
  cut,
  binomialff,
  poissonff,
  negbinomial,
  gamma2.
if (FALSE) {
# Example 1: Species packing model:
n <- 100; p <- 5; S <- 5
mydata <- rcqo(n, p, S, es.opt = TRUE, eq.max = TRUE)
names(mydata)
(myform <- attr(mydata, "formula"))
fit <- cqo(myform, poissonff, mydata, Bestof = 3)  # eq.tol = TRUE
matplot(attr(mydata, "latvar"), mydata[,-(1:(p-1))], col = 1:S)
persp(fit, col = 1:S, add = TRUE)
lvplot(fit, lcol = 1:S, y = TRUE, pcol = 1:S)  # Same plot as above
# Compare the fitted model with the 'truth'
concoef(fit)  # The fitted model
attr(mydata, "concoefficients")  # The 'truth'
c(apply(attr(mydata, "latvar"), 2, sd),
  apply(latvar(fit), 2, sd))  # Both values should be approx equal
# Example 2: negative binomial data fitted using a Poisson model:
n <- 200; p <- 5; S <- 5
mydata <- rcqo(n, p, S, fam = "negbin", sqrt = TRUE)
myform <- attr(mydata, "formula")
fit <- cqo(myform, fam = poissonff, dat = mydata)  # I.tol = TRUE,
lvplot(fit, lcol = 1:S, y = TRUE, pcol = 1:S)
# Compare the fitted model with the 'truth'
concoef(fit)  # The fitted model
attr(mydata, "concoefficients")  # The 'truth'
}
Run the code above in your browser using DataLab