Beta Diversity Wrapper Function
beta_div(
counts,
metric,
norm = "percent",
power = 1.5,
pseudocount = NULL,
alpha = 0.5,
tree = NULL,
pairs = NULL,
margin = 1L,
cpus = n_cpus()
)A numeric vector.
A numeric matrix of count data where each column is a
feature, and each row is a sample. Any object coercible with
as.matrix() can be given here, as well as phyloseq, rbiom,
SummarizedExperiment, and TreeSummarizedExperiment objects. For
optimal performance with very large datasets, see the guide in
vignette('performance').
The name of a beta diversity metric. One of c('aitchison', 'bhattacharyya', 'bray', 'canberra', 'chebyshev', 'chord', 'clark', 'divergence', 'euclidean', 'generalized_unifrac', 'gower', 'hamming', 'hellinger', 'horn', 'jaccard', 'jensen', 'jsd', 'lorentzian', 'manhattan', 'matusita', 'minkowski', 'morisita', 'motyka', 'normalized_unifrac', 'ochiai', 'psym_chisq', 'soergel', 'sorensen', 'squared_chisq', 'squared_chord', 'squared_euclidean', 'topsoe', 'unweighted_unifrac', 'variance_adjusted_unifrac', 'wave_hedges', 'weighted_unifrac'). Flexible
matching is supported (see below). Programmatic access via
list_metrics('beta').
Normalize the incoming counts. Options are:
norm = "percent" - Relative abundance (sample abundances sum to 1).
norm = "binary" - Unweighted presence/absence (each count is either 0 or 1).
norm = "clr" - Centered log ratio.
norm = "none" - No transformation.
Default: 'percent', which is the expected input for these formulas.
Scaling factor for the magnitude of differences between
communities (\(p\)). Default: 1.5
The value to add to all counts in counts to prevent
taking log(0) for unobserved features. The default, NULL, selects
the smallest non-zero value in counts.
How much weight to give to relative abundances; a value
between 0 and 1, inclusive. Setting alpha=1 is equivalent to
normalized_unifrac().
A phylo-class object representing the phylogenetic tree for
the OTUs in counts. The OTU identifiers given by colnames(counts)
must be present in tree. Can be omitted if a tree is embedded with
the counts object or as attr(counts, 'tree').
Which combinations of samples should distances be
calculated for? The default value (NULL) calculates all-vs-all.
Provide a numeric or logical vector specifying positions in the
distance matrix to calculate. See examples.
If your samples are in the matrix's rows, set to 1L. If
your samples are in columns, set to 2L. Ignored when counts is a
phyloseq, rbiom, SummarizedExperiment, or
TreeSummarizedExperiment object. Default: 1L
How many parallel processing threads should be used. The
default, n_cpus(), will use all logical CPU cores.
List of Beta Diversity Metrics
| Option / Function Name | Metric Name |
aitchison | Aitchison distance |
bhattacharyya | Bhattacharyya distance |
bray | Bray-Curtis dissimilarity |
canberra | Canberra distance |
chebyshev | Chebyshev distance |
chord | Chord distance |
clark | Clark's divergence distance |
divergence | Divergence |
euclidean | Euclidean distance |
generalized_unifrac | Generalized UniFrac (GUniFrac) |
gower | Gower distance |
hamming | Hamming distance |
hellinger | Hellinger distance |
horn | Horn-Morisita dissimilarity |
jaccard | Jaccard distance |
jensen | Jensen-Shannon distance |
jsd | Jesen-Shannon divergence (JSD) |
lorentzian | Lorentzian distance |
manhattan | Manhattan distance |
matusita | Matusita distance |
minkowski | Minkowski distance |
morisita | Morisita dissimilarity |
motyka | Motyka dissimilarity |
normalized_unifrac | Normalized Weighted UniFrac |
ochiai | Otsuka-Ochiai dissimilarity |
psym_chisq | Probabilistic Symmetric Chi-Squared distance |
soergel | Soergel distance |
sorensen | Dice-Sorensen dissimilarity |
squared_chisq | Squared Chi-Squared distance |
squared_chord | Squared Chord distance |
squared_euclidean | Squared Euclidean distance |
topsoe | Topsoe distance |
unweighted_unifrac | Unweighted UniFrac |
variance_adjusted_unifrac | Variance-Adjusted Weighted UniFrac (VAW-UniFrac) |
wave_hedges | Wave Hedges distance |
weighted_unifrac | Weighted UniFrac |
Flexible name matching
Case insensitive and partial matching. Any runs of non-alpha characters are
converted to underscores. E.g. metric = 'Weighted UniFrac selects
weighted_unifrac.
UniFrac names can be shortened to the first letter plus "unifrac". E.g.
uunifrac, w_unifrac, or V UniFrac. These also support partial matching.
Finished code should always use the full primary option name to avoid ambiguity with future additions to the metrics list.
# Example counts matrix
ex_counts
# Bray-Curtis distances
beta_div(ex_counts, 'bray')
# Generalized UniFrac distances
beta_div(ex_counts, 'GUniFrac', tree = ex_tree)
Run the code above in your browser using DataLab