Alpha Diversity Metrics
ace(counts, cutoff = 10L, margin = 1L, cpus = n_cpus())berger(counts, norm = "percent", margin = 1L, cpus = n_cpus())
brillouin(counts, margin = 1L, cpus = n_cpus())
chao1(counts, margin = 1L, cpus = n_cpus())
faith(counts, tree = NULL, margin = 1L, cpus = n_cpus())
fisher(counts, digits = 3L, margin = 1L, cpus = n_cpus())
inv_simpson(counts, norm = "percent", margin = 1L, cpus = n_cpus())
margalef(counts, margin = 1L, cpus = n_cpus())
mcintosh(counts, margin = 1L, cpus = n_cpus())
menhinick(counts, margin = 1L, cpus = n_cpus())
observed(counts, margin = 1L, cpus = n_cpus())
shannon(counts, norm = "percent", margin = 1L, cpus = n_cpus())
simpson(counts, norm = "percent", margin = 1L, cpus = n_cpus())
squares(counts, margin = 1L, cpus = n_cpus())
A numeric vector.
A numeric matrix of count data where each column is a
feature, and each row is a sample. Any object coercible with
as.matrix() can be given here, as well as phyloseq, rbiom,
SummarizedExperiment, and TreeSummarizedExperiment objects. For
optimal performance with very large datasets, see the guide in
vignette('performance').
The maximum number of observations to consider "rare".
Default: 10.
If your samples are in the matrix's rows, set to 1L. If
your samples are in columns, set to 2L. Ignored when counts is a
phyloseq, rbiom, SummarizedExperiment, or
TreeSummarizedExperiment object. Default: 1L
How many parallel processing threads should be used. The
default, n_cpus(), will use all logical CPU cores.
Normalize the incoming counts. Options are:
norm = "percent" - Relative abundance (sample abundances sum to 1).
norm = "binary" - Unweighted presence/absence (each count is either 0 or 1).
norm = "clr" - Centered log ratio.
norm = "none" - No transformation.
Default: 'percent', which is the expected input for these formulas.
A phylo-class object representing the phylogenetic tree for
the OTUs in counts. The OTU identifiers given by colnames(counts)
must be present in tree. Can be omitted if a tree is embedded with
the counts object or as attr(counts, 'tree').
Precision of the returned values, in number of decimal
places. E.g. the default digits=3 could return 6.392.
Prerequisite: all counts are whole numbers.
Given:
\(n\) : The number of features (e.g. species, OTUs, ASVs, etc).
\(X_i\) : Integer count of the \(i\)-th feature.
\(X_T\) : Total of all counts (i.e. sequencing depth). \(X_T = \sum_{i=1}^{n} X_i\)
\(P_i\) : Proportional abundance of the \(i\)-th feature. \(P_i = X_i / X_T\)
\(F_1\) : Number of features where \(X_i = 1\) (i.e. singletons).
\(F_2\) : Number of features where \(X_i = 2\) (i.e. doubletons).
Abundance-based Coverage Estimator (ACE) ace() | See below. |
Berger-Parker Index berger() | \(\max(P_i)\) |
Brillouin Index brillouin() | \(\displaystyle \frac{\ln{[(\sum_{i = 1}^{n} X_i)!]} - \sum_{i = 1}^{n} \ln{(X_i!)}}{\sum_{i = 1}^{n} X_i}\) |
Chao1 chao1() | \(\displaystyle n + \frac{(F_1)^2}{2 F_2}\) |
Faith's Phylogenetic Diversity faith() | See below. |
Fisher's Alpha (\(\alpha\)) fisher() | \(\displaystyle \frac{n}{\alpha} = \ln{\left(1 + \frac{X_T}{\alpha}\right)}\) The value of \(\alpha\) must be solved for iteratively. |
Gini-Simpson Index simpson() | \(1 - \sum_{i = 1}^{n} P_i^2\) |
Inverse Simpson Index inv_simpson() | \(1 / \sum_{i = 1}^{n} P_i^2\) |
Margalef's Richness Index margalef() | \(\displaystyle \frac{n - 1}{\ln{X_T}}\) |
McIntosh Index mcintosh() | \(\displaystyle \frac{X_T - \sqrt{\sum_{i = 1}^{n} (X_i)^2}}{X_T - \sqrt{X_T}}\) |
Menhinick's Richness Index menhinick() | \(\displaystyle \frac{n}{\sqrt{X_T}}\) |
Observed Features observed() | \(n\) |
Shannon Diversity Index shannon() | \(-\sum_{i = 1}^{n} P_i \times \ln(P_i)\) |
Squares Richness Estimator squares() | \(\displaystyle n + \frac{(F_1)^2 \sum_{i=1}^{n} (X_i)^2}{X_T^2 - nF_1}\) |
Given:
\(n\) : The number of features (e.g. species, OTUs, ASVs, etc).
\(r\) : Rare cutoff. Features with \(\le r\) counts are considered rare.
\(X_i\) : Integer count of the \(i\)-th feature.
\(F_i\) : Number of features with exactly \(i\) counts.
\(F_1\) : Number of features where \(X_i = 1\) (i.e. singletons).
\(F_{rare}\) : Number of rare features where \(X_i \le r\).
\(F_{abund}\) : Number of abundant features where \(X_i > r\).
\(X_{rare}\) : Total counts belonging to rare features.
\(C_{ace}\) : The sample abundance coverage estimator, defined below.
\(\gamma_{ace}^2\) : The estimated coefficient of variation, defined below.
\(D_{ace}\) : Estimated number of features in the sample.
\(\displaystyle C_{ace} = 1 - \frac{F_1}{X_{rare}}\)
\(\displaystyle \gamma_{ace}^2 = \max\left[\frac{F_{rare} \sum_{i=1}^{r}i(i-1)F_i}{C_{ace}X_{rare}(X_{rare} - 1)} - 1, 0\right]\)
\(\displaystyle D_{ace} = F_{abund} + \frac{F_{rare}}{C_{ace}} + \frac{F_1}{C_{ace}}\gamma_{ace}^2 \)
Given \(n\) branches with lengths \(L\) and a sample's abundances \(A\) on each of those branches coded as 1 for present or 0 for absent:
\(\sum_{i = 1}^{n} L_i A_i\)
# Example counts matrix
t(ex_counts)
ace(ex_counts)
chao1(ex_counts)
squares(ex_counts)
Run the code above in your browser using DataLab