adiv_functions: Alpha Diversity Metrics

Description

Alpha Diversity Metrics

Usage

ace(counts, cutoff = 10L, margin = 1L, cpus = n_cpus())
berger(counts, norm = "percent", margin = 1L, cpus = n_cpus())
brillouin(counts, margin = 1L, cpus = n_cpus())
chao1(counts, margin = 1L, cpus = n_cpus())
faith(counts, tree = NULL, margin = 1L, cpus = n_cpus())
fisher(counts, digits = 3L, margin = 1L, cpus = n_cpus())
inv_simpson(counts, norm = "percent", margin = 1L, cpus = n_cpus())
margalef(counts, margin = 1L, cpus = n_cpus())
mcintosh(counts, margin = 1L, cpus = n_cpus())
menhinick(counts, margin = 1L, cpus = n_cpus())
observed(counts, margin = 1L, cpus = n_cpus())
shannon(counts, norm = "percent", margin = 1L, cpus = n_cpus())
simpson(counts, norm = "percent", margin = 1L, cpus = n_cpus())
squares(counts, margin = 1L, cpus = n_cpus())

Value

A numeric vector.

Arguments

counts

A numeric matrix of count data where each column is a feature, and each row is a sample. Any object coercible with as.matrix() can be given here, as well as phyloseq, rbiom, SummarizedExperiment, and TreeSummarizedExperiment objects. For optimal performance with very large datasets, see the guide in vignette('performance').

cutoff

The maximum number of observations to consider "rare". Default: 10.

margin

If your samples are in the matrix's rows, set to 1L. If your samples are in columns, set to 2L. Ignored when counts is a phyloseq, rbiom, SummarizedExperiment, or TreeSummarizedExperiment object. Default: 1L

cpus

How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

norm

Normalize the incoming counts. Options are:

norm = "percent" -: Relative abundance (sample abundances sum to 1).

norm = "binary" -

Unweighted presence/absence (each count is either 0 or 1).

norm = "clr" -

Centered log ratio.

norm = "none" -

No transformation.

Default: 'percent', which is the expected input for these formulas.

tree

A phylo-class object representing the phylogenetic tree for the OTUs in counts. The OTU identifiers given by colnames(counts) must be present in tree. Can be omitted if a tree is embedded with the counts object or as attr(counts, 'tree').

digits

Precision of the returned values, in number of decimal places. E.g. the default digits=3 could return 6.392.

Formulas

Prerequisite: all counts are whole numbers.

Given:

\(n\) : The number of features (e.g. species, OTUs, ASVs, etc).
\(X_i\) : Integer count of the \(i\)-th feature.
\(X_T\) : Total of all counts (i.e. sequencing depth). \(X_T = \sum_{i=1}^{n} X_i\)
\(P_i\) : Proportional abundance of the \(i\)-th feature. \(P_i = X_i / X_T\)
\(F_1\) : Number of features where \(X_i = 1\) (i.e. singletons).
\(F_2\) : Number of features where \(X_i = 2\) (i.e. doubletons).


Abundance-based Coverage Estimator (ACE) `ace()`	See below.
Berger-Parker Index `berger()`	\(\max(P_i)\)
Brillouin Index `brillouin()`	\(\displaystyle \frac{\ln{[(\sum_{i = 1}^{n} X_i)!]} - \sum_{i = 1}^{n} \ln{(X_i!)}}{\sum_{i = 1}^{n} X_i}\)
Chao1 `chao1()`	\(\displaystyle n + \frac{(F_1)^2}{2 F_2}\)
Faith's Phylogenetic Diversity `faith()`	See below.
Fisher's Alpha (\(\alpha\)) `fisher()`	\(\displaystyle \frac{n}{\alpha} = \ln{\left(1 + \frac{X_T}{\alpha}\right)}\) The value of \(\alpha\) must be solved for iteratively.
Gini-Simpson Index `simpson()`	\(1 - \sum_{i = 1}^{n} P_i^2\)
Inverse Simpson Index `inv_simpson()`	\(1 / \sum_{i = 1}^{n} P_i^2\)
Margalef's Richness Index `margalef()`	\(\displaystyle \frac{n - 1}{\ln{X_T}}\)
McIntosh Index `mcintosh()`	\(\displaystyle \frac{X_T - \sqrt{\sum_{i = 1}^{n} (X_i)^2}}{X_T - \sqrt{X_T}}\)
Menhinick's Richness Index `menhinick()`	\(\displaystyle \frac{n}{\sqrt{X_T}}\)
Observed Features `observed()`	\(n\)
Shannon Diversity Index `shannon()`	\(-\sum_{i = 1}^{n} P_i \times \ln(P_i)\)
Squares Richness Estimator `squares()`	\(\displaystyle n + \frac{(F_1)^2 \sum_{i=1}^{n} (X_i)^2}{X_T^2 - nF_1}\)

Abundance-based Coverage Estimator (ACE)

Given:

\(n\) : The number of features (e.g. species, OTUs, ASVs, etc).
\(r\) : Rare cutoff. Features with \(\le r\) counts are considered rare.
\(X_i\) : Integer count of the \(i\)-th feature.
\(F_i\) : Number of features with exactly \(i\) counts.
\(F_1\) : Number of features where \(X_i = 1\) (i.e. singletons).
\(F_{rare}\) : Number of rare features where \(X_i \le r\).
\(F_{abund}\) : Number of abundant features where \(X_i > r\).
\(X_{rare}\) : Total counts belonging to rare features.
\(C_{ace}\) : The sample abundance coverage estimator, defined below.
\(\gamma_{ace}^2\) : The estimated coefficient of variation, defined below.
\(D_{ace}\) : Estimated number of features in the sample.

\(\displaystyle C_{ace} = 1 - \frac{F_1}{X_{rare}}\)

\(\displaystyle \gamma_{ace}^2 = \max\left[\frac{F_{rare} \sum_{i=1}^{r}i(i-1)F_i}{C_{ace}X_{rare}(X_{rare} - 1)} - 1, 0\right]\)

\(\displaystyle D_{ace} = F_{abund} + \frac{F_{rare}}{C_{ace}} + \frac{F_1}{C_{ace}}\gamma_{ace}^2 \)

Faith's Phylogenetic Diversity (Faith's PD)

Given \(n\) branches with lengths \(L\) and a sample's abundances \(A\) on each of those branches coded as 1 for present or 0 for absent:

\(\sum_{i = 1}^{n} L_i A_i\)

Examples

Run this code

    # Example counts matrix
    t(ex_counts)
    
    ace(ex_counts)
    
    chao1(ex_counts)
    
    squares(ex_counts)

Run the code above in your browser using DataLab