lvsplot: Plot the latent variables from an boral model

Description

Construct a 1-D index plot or 2-D scatterplot of the latent variables, and their corresponding coefficients i.e., a biplot, from a fitted boral model.

Usage

lvsplot(x, jitter = FALSE, biplot = TRUE, ind.spp = NULL, alpha = 0.5, 
	main = NULL, est = "median", which.lvs = c(1,2), return.vals = FALSE, ...)

Arguments

An object for class "boral".

jitter

If jitter = TRUE, then some jittering is applied so that points on the plots do not overlap exactly (which can often occur with discrete data, small sample sizes, and if some sites are identical in terms species co-occurence). Please see jitter for its implementation. Defaults to FALSE.

biplot

If biplot = TRUE, then a biplot is construct such that both the latent variables and their corresponding coefficients are plotted. Otherwise, only the latent variable scores are plotted. Defaults to TRUE.

ind.spp

Controls the number of latent variable coefficients to plot if biplot = TRUE. If ind.spp is an integer, then only the first ind.spp "most important" latent variable coefficients are included in the biplot, where "most important" means the latent variable coefficients with the largests L2-norms. Defaults to NULL, in which case all latent variable coefficients are included in the biplot.

alpha

A numeric scalar between 0 and 1 that is used to control the relative scaling of the latent variables and their coefficients, when constructing a biplot. Defaults to 0.5, and we typically recommend between 0.45 to 0.55 so that the latent variables and their coefficients are on roughly the same scale.

main

Title for resulting ordination plot. Defaults to NULL, in which case a "standard" title is used.

est

A choice of either the posterior median (est = "median") or posterior mean (est = "mean"), which are then treated as estimates and the ordinations based off. Default is posterior median.

which.lvs

A vector of length two, indicating which latent variables (ordination axes) to plot which x is an object with two or more latent variables. The argument is ignored is x only contains one latent variables. Defaults to which.lvs = c(1,2).

return.vals

If TRUE, then the scaled latent variables scores and corresponding scaled coefficients are returned (based on the value of alpha used). This is useful if the user wants to construct their own custom model-based ordinations. Defaults to FALSE.

...

Additional graphical options to be included in. These include values for cex, cex.lab, cex.axis, cex.main, lwd, and so on.

Details

This function allows an ordination plot to be constructed, based on either the posterior medians and posterior means of the latent variables respectively depending on the choice of est. The latent variables are labeled using the row index of the response matrix y. If the fitted model contains more than two latent variables, then one can specify which latent variables i.e., ordination axes, to plot based on the which.lvs argument. This can prove useful (to check) if certain sites are outliers on one particular ordination axes.

If the fitted model did not contain any covariates, the ordination plot can be interpreted in the exactly same manner as unconstrained ordination plots constructed from methods such as Nonmetric Multi-dimensional Scaling (NMDS, Kruskal, 1964) and Correspondence Analysis (CA, Hill, 1974). With multivariate abundance data for instance, where the response matrix y consists of \(n\) sites and \(p\) species, the ordination plots can be studied to look for possible clustering of sites, location and/or dispersion effects, an arch pattern indicative of some sort species succession over an environmental gradient, and so on.

If the fitted model did include covariates, then a ``residual ordination" plot is produced, which can be interpreted can offering a graphical representation of the (main patterns of) residual covarations, i.e. covariations after accounting for the covariates. With multivariate abundance data for instance, these residual ordination plots represent could represent residual species co-occurrence due to phylogency, species competition and facilitation, missing covariates, and so on (Warton et al., 2015)

If biplot = TRUE, then a biplot is constructed so that both the latent variables and their corresponding coefficients are included in their plot (Gabriel, 1971). The latent variable coefficients are shown in red, and are indexed by the column names of y. The number of latent variable coefficients to plot is controlled by ind.spp. In ecology for example, often we are only be interested in the "indicator" species, e.g. the species with most represent a particular set of sites or species with the strongest covariation (see Chapter 9, Legendre and Legendre, 2012, for additional discussion). In such case, we can then biplot only the ind.spp "most important" species, as indicated by the the L2-norm of their latent variable coefficients.

As with correspondence analysis, the relative scaling of the latent variables and the coefficients in a biplot is essentially arbitrary, and could be adjusted to focus on the sites, species, or put even weight on both (see Section 9.4, Legendre and Legendre, 2012). In lvsplot, this relative scaling is controlled by the alpha argument, which basically works by taking the latent variables to a power alpha and the latent variable coefficients to a power 1-alpha.

For latent variable models, we are generally interested in "symmetric plots" that place the latent variables and their coefficients on the same scale. In principle, this is achieved by setting alpha = 0.5, the default value, although sometimes this needs to be tweaked slighlty to a value between 0.45 and 0.55 (see also the corresp function in the MASS package that also produces symmetric plots, as well as Section 5.4, Borcard et al., 2011 for more details on scaling).

References

Borcard et al. (2011). Numerical Ecology with R. Springer.
Gabriel, K. R. (1971). The biplot graphic display of matrices with application to principal component analysis. Biometrika, 58, 453-467.
Hill, M. O. (1974). Correspondence analysis: a neglected multivariate method. Applied statistics, 23, 340-354.
Kruskal, J. B. (1964). Nonmetric multidimensional scaling: a numerical method. Psychometrika, 29, 115-129.
Legendre, P. and Legendre, L. (2012). Numerical ecology, Volume 20. Elsevier.
Warton et al. (2015). So Many Variables: Joint Modeling in Community Ecology. Trends in Ecology and Evolution, to appear

Examples

Run this code

# NOT RUN {
## NOTE: The values below MUST NOT be used in a real application;
## they are only used here to make the examples run quick!!!
example_mcmc_control <- list(n.burnin = 10, n.iteration = 100, 
     n.thin = 1)

library(mvabund) ## Load a dataset from the mvabund package
data(spider)
y <- spider$abun
n <- nrow(y)
p <- ncol(y)
     
spiderfit_nb <- boral(y, family = "negative.binomial", lv.control = list(num.lv = 2),
    row.eff = "fixed", mcmc.control = example_mcmc_control)

lvsplot(spiderfit_nb) 
# }

Run the code above in your browser using DataLab