dc_CA: Performs (weighted) double constrained correspondence analysis (dc-CA)

Description

Double constrained correspondence analysis (dc-CA) for analyzing (multi-)trait (multi-)environment ecological data using library vegan and native R code. It has a formula interface which allows to assess, for example, the importance of trait interactions in shaping ecological communities. The function dc_CA has an option to divide the abundance data of a site by the site total, giving equal site weights. This division has the advantage that the multivariate analysis corresponds with an unweighted (multi-trait) community-level analysis, instead of being weighted (Kleyer et al. 2012, ter Braak and van Rossum, 2025).

Usage

dc_CA(
  formulaEnv = NULL,
  formulaTraits = NULL,
  response = NULL,
  dataEnv = NULL,
  dataTraits = NULL,
  divideBySiteTotals = NULL,
  dc_CA_object = NULL,
  env_explain = TRUE,
  use_vegan_cca = FALSE,
  verbose = TRUE
)

Value

A list of class

dcca; that is a list with elements

CCAonTraits

a cca.object from the cca analysis of the transpose of the closed response using formula formulaTraits.

formulaTraits

the argument formulaTraits. If the formula was ~., it was changed to explicit trait names.

data

a list of Y, dataEnv and dataTraits, after removing empty rows and columns in response and after closure if divideBySiteTotals = TRUE and with the corresponding rows in dataEnv and dataTraits removed.

weights

a list of unit-sum weights of columns and rows. The names of the list are c("columns", "rows"), in that order.

Nobs

number of sites (rows).

CWMs_orthonormal_traits

Community weighted means w.r.t. orthonormalized traits.

RDAonEnv

a wrda object or cca.object from the wrda or, if with equal row weights, rda analysis, respectively of the column scores of the cca, which are the CWMs of orthonormalized traits, using formula formulaEnv.

formulaEnv

the argument formulaEnv. If the formula was ~., it was changed to explicit environmental variable names.

eigenvalues

the dc-CA eigenvalues (same as those of the rda analysis).

c_traits_normed0

mean, sd, VIF and (regression) coefficients of the traits that define the dc-CA axes in terms of the traits with t-ratios missing indicated by NAs for 'tval1'.

inertia

a one-column matrix with, at most, six inertias (weighted variances):

total: the total inertia.
conditionT: the inertia explained by the condition in formulaTraits if present (neglecting row constraints).
traits_explain: the trait-structured variation, i.e. the inertia explained by the traits (without constaints on the rows and conditional on the Condition in formulaTraits). This is the maximum that the row predictors could explain in dc-CA (the sum of the last two items is thus less than this value).
env_explain: the environmentally structured variation, i.e. the inertia explained by the environment (without constraints on the columns but conditional on the Condition formulaEnv). This is the maximum that the column predictors could explain in dc-CA (the item constraintsTE is thus less than this value). The value is NA, if there is collinearity in the environmental data.
conditionTE: the trait-constrained variation explained by the condition in formulaEnv.
constraintsTE: the trait-constrained variation explained by the predictors (without the row covariates).

If verbose is TRUE (or after out <- print(out) is invoked) there are three more items.

c_traits_normed: mean, sd, VIF and (regression) coefficients of the traits that define the dc-CA trait axes (composite traits), and their optimistic t-ratio.
c_env_normed: mean, sd, VIF and (regression) coefficients of the environmental variables that define the dc-CA axes in terms of the environmental variables (composite gradients), and their optimistic t-ratio.
species_axes: a list with four items
- species_scores: a list with names c("species_scores_unconstrained", "lc_traits_scores") with the matrix with species niche centroids along the dc-CA axes (composite gradients) and the matrix with linear combinations of traits.
- correlation: a matrix with inter-set correlations of the traits with their SNCs.
- b_se: a matrix with (unstandardized) regression coefficients for traits and their optimistic standard errors.
- R2_traits: a vector with coefficient of determination (R2) of the SNCs on to the traits. The square-root thereof could be called the species-trait correlation in analogy with the species-environment correlation in CCA.
sites_axes: a list with four items
- site_scores: a list with names c("site_scores_unconstrained", "lc_env_scores") with the matrix with community weighted means (CWMs) along the dc-CA axes (composite gradients) and the matrix with linear combinations of environmental variables.
- correlation: a matrix with inter-set correlations of the environmental variables with their CWMs.
- b_se: a matrix with (unstandardized) regression coefficients for environmental variables and their optimistic standard errors.
- R2_env: a vector with coefficient of determination (R2) of the CWMs on to the environmental variables. The square-root thereof has been called the species-environmental correlation in CCA.

All scores in the dcca object are in scaling "sites" (1): the scaling with Focus on Case distances .

Arguments

formulaEnv: two-sided or one-sided formula for the rows (samples) with row predictors in dataEnv. The left hand side of the formula is ignored if it is specified in the response argument. Specify row covariates (if any) by adding + Condition(covariate-formula) to formulaEnv as in rda. The covariate-formula should not contain a ~ (tilde). Default: NULL for ~., i.e. all variables in dataEnv are predictor variables.
formulaTraits: formula or one-sided formula for the columns (species) with column predictors in dataTraits. When two-sided, the left hand side of the formula is not used. Specify column covariates (if any ) by adding + Condition(covariate-formula) to formulaTraits as in cca. The covariate-formula should not contain a ~ (tilde). Default: NULL for ~., i.e. all variables in dataTraits are predictor traits.
response: matrix, data frame of the abundance data (dimension n x m) or list with community weighted means (CWMs) from fCWM_SNC, NULL. If NULL, the response should be at the left-hand side of formulaEnv. See Details for analyses starting from community weighted means. Rownames of response, if any, are carried through.
dataEnv: matrix or data frame of the row predictors, with rows corresponding to those in response. (dimension n x p).
dataTraits: matrix or data frame of the column predictors, with rows corresponding to the columns in response. (dimension m x q).
divideBySiteTotals: logical; default TRUE for closing the data by dividing the rows in the response by their total. However, the default is FALSE, when the species totals are proportional to N2*(N-N2) with N2 the Hill numbers of order 2 of the species and N the number of sites, as indicator that the response data have been pre-processed to N2-based marginals using ipf2N2.
dc_CA_object: optional object from an earlier run of this function. Useful if the same formula for the columns (formulaTraits), dataTraits and response are used with a new formula for the rows. If set, the data of the previous run is used and the result of its first step is taken for the new analysis and env_explain is set to FALSE.
env_explain: logical (default TRUE) for calculation of the inertia explained by the environmental variable (based on a CCA of abundance (with divideBySiteTotals, if true) on the environmental formula).
use_vegan_cca: default TRUE.
verbose: logical for printing a simple summary (default: TRUE)

Details

Empty (all zero) rows and columns in response are removed from the response and the corresponding rows from dataEnv and dataTraits. Subsequently, any columns with missing values are removed from dataEnv and dataTraits. It gives an error ('name_of_variable' not found), if variables with missing entries are specified in formulaEnv and formulaTraits.

Computationally, dc-CA can be carried out by a single singular value decomposition (ter Braak et al. 2018), but it is here computed in two steps. In the first step, the transpose of the response is regressed on to the traits (the column predictors) using cca with formulaTraits. The column scores of this analysis (in scaling 1) are community weighted means (CWM) of the orthonormalized traits. These are then regressed on the environmental (row) predictors using wrda with formulaEnv or using rda, if site weights are equal.

A dc-CA can be carried out on, what statisticians call, the sufficient statistics of the method. This is useful, when the abundance data are not available or could not be made public in a paper attempting reproducible research. In this case, response should be a list with as first element community weighted means (e.g. list(CWM = CWMs)) with respect to the traits, and the trait data, and, optionally, further list elements, for functions related to dc_CA. The minimum is a list(CWM = CWMs, weight = list(columns = species_weights)) with CWM a matrix or data.frame, but then formulaEnv, formulaTraits, dataEnv, dataTraits must be specified in the call to dc_CA. The function fCWM_SNC and its example show how to set the response for this and helps to create the response from abundance data in these non-standard applications of dc-CA. Species and site weights, if not set in response$weights can be set by a variable weight in the data frames dataTraits and dataEnv, respectively, but formulas should then not be ~..

The statistics and scores in the example dune_dcCA.r, have been checked against the results in Canoco 5.15 (ter Braak & Šmilauer, 2018).

References

Kleyer, M., Dray, S., Bello, F., Lepš, J., Pakeman, R.J., Strauss, B., Thuiller, W. & Lavorel, S. (2012) Assessing species and community functional responses to environmental gradients: which multivariate methods? Journal of Vegetation Science, 23, 805-821. tools:::Rd_expr_doi("10.1111/j.1654-1103.2012.01402.x")

ter Braak, CJF, Šmilauer P, and Dray S. (2018). Algorithms and biplots for double constrained correspondence analysis. Environmental and Ecological Statistics, 25(2), 171-197. tools:::Rd_expr_doi("10.1007/s10651-017-0395-x")

ter Braak C.J.F. and P. Šmilauer (2018). Canoco reference manual and user's guide: software for ordination (version 5.1x). Microcomputer Power, Ithaca, USA, 536 pp.

ter Braak, C.J.F. and van Rossum, B. (2025). Linking Multivariate Trait Variation to the Environment: Advantages of Double Constrained Correspondence Analysis with the R Package Douconca. Ecological Informatics, 88. tools:::Rd_expr_doi("10.1016/j.ecoinf.2025.103143")

Oksanen, J., et al. (2024). vegan: Community Ecology Package. R package version 2.6-6.1. https://CRAN.R-project.org/package=vegan.

Examples

Run this code

data("dune_trait_env")

# rownames are carried forward in results
rownames(dune_trait_env$comm) <- dune_trait_env$comm$Sites
abun <- dune_trait_env$comm[, -1]  # must delete "Sites"
mod <- dc_CA(formulaEnv = abun ~ A1 + Moist + Mag + Use + Manure,
             formulaTraits = ~ SLA + Height + LDMC + Seedmass + Lifespan,
             dataEnv = dune_trait_env$envir,
             dataTraits = dune_trait_env$traits,
			 verbose = FALSE)

print(mod) # same output as with verbose = TRUE (the default of verbose).																		 
anova(mod, by = "axis")
# For more demo on testing, see demo dune_test.r

mod_scores <- scores(mod)
# correlation of axes with a variable that is not in the model
scores(mod, display = "cor", scaling = "sym", which_cor = list(NULL, "X_lot"))

cat("head of unconstrained site scores, with meaning\n")
print(head(mod_scores$sites))

mod_scores_tidy <- scores(mod, tidy = TRUE)
print("names of the tidy scores")
print(names(mod_scores_tidy))
cat("\nThe levels of the tidy scores\n")
print(levels(mod_scores_tidy$score))

cat("\nFor illustration: a dc-CA model with a trait covariate\n")
mod2 <- dc_CA(formulaEnv = abun ~ A1 + Moist + Mag + Use + Manure,
              formulaTraits = ~ SLA + Height + LDMC + Lifespan + Condition(Seedmass),
              dataEnv = dune_trait_env$envir,
              dataTraits = dune_trait_env$traits)

cat("\nFor illustration: a dc-CA model with both environmental and trait covariates\n")
mod3 <- dc_CA(formulaEnv = abun ~ A1 + Moist + Use + Manure + Condition(Mag),
              formulaTraits = ~ SLA + Height + LDMC + Lifespan + Condition(Seedmass),
              dataEnv = dune_trait_env$envir,
              dataTraits = dune_trait_env$traits, 
			  verbose = FALSE)

cat("\nFor illustration: same model but using dc_CA_object = mod2 for speed, ", 
    "as the trait model and data did not change\n")
mod3B <- dc_CA(formulaEnv = abun ~ A1 + Moist + Use + Manure + Condition(Mag),
               dataEnv = dune_trait_env$envir,
               dc_CA_object = mod2, 
			   verbose= FALSE)
cat("\ncheck on equality of mod3 (from data) and mod3B (from a dc_CA_object)\n",
    "the expected difference is in the component 'call'\n ")

print(all.equal(mod3[-c(5,12)], mod3B[-c(5,12)])) #  only the component call differs
print(mod3$inertia[-c(3,5),]/mod3B$inertia) #        and mod3 has two more inertia items

Run the code above in your browser using DataLab