bdiv_table: Distance / dissimilarity between samples.

Description

Distance / dissimilarity between samples.

Usage

bdiv_table(
  biom,
  bdiv = "bray",
  weighted = NULL,
  normalized = NULL,
  tree = NULL,
  md = ".all",
  within = NULL,
  between = NULL,
  delta = ".all",
  norm = "none",
  pseudocount = NULL,
  power = 1.5,
  alpha = 0.5,
  transform = "none",
  ties = "random",
  seed = 0,
  cpus = n_cpus(),
  ...
)
bdiv_matrix(
  biom,
  bdiv = "bray",
  weighted = NULL,
  normalized = NULL,
  tree = NULL,
  within = NULL,
  between = NULL,
  norm = "none",
  pseudocount = NULL,
  power = 1.5,
  alpha = 0.5,
  transform = "none",
  ties = "random",
  seed = 0,
  cpus = n_cpus()
)
bdiv_distmat(
  biom,
  bdiv = "bray",
  weighted = NULL,
  normalized = NULL,
  tree = NULL,
  within = NULL,
  between = NULL,
  norm = "none",
  pseudocount = NULL,
  power = 1.5,
  alpha = 0.5,
  transform = "none",
  ties = "random",
  seed = 0,
  cpus = n_cpus()
)

Value

bdiv_matrix() -: An R matrix of samples x samples.

bdiv_distmat() -

A dist-class distance matrix.

bdiv_table() -

A tibble data.frame with columns named .sample1, .sample2, .bdiv, .distance, and any fields requested by md. Numeric metadata fields will be returned as abs(x - y); categorical metadata fields as "x", "y", or "x vs y".

Arguments

biom

An rbiom object, or any value accepted by as_rbiom().

bdiv

Beta diversity distance algorithm(s) to use. Options are: c("aitchison", "bhattacharyya", "bray", "canberra", "chebyshev", "chord", "clark", "sorensen", "divergence", "euclidean", "generalized_unifrac", "gower", "hamming", "hellinger", "horn", "jaccard", "jensen", "jsd", "lorentzian", "manhattan", "matusita", "minkowski", "morisita", "motyka", "normalized_unifrac", "ochiai", "psym_chisq", "soergel", "squared_chisq", "squared_chord", "squared_euclidean", "topsoe", "unweighted_unifrac", "variance_adjusted_unifrac", "wave_hedges", "weighted_unifrac"). For the UniFrac family, a phylogenetic tree must be present in biom or explicitly provided via tree=. Supports partial matching. Multiple values are allowed for functions which return a table or plot. Default: "bray"

weighted

(Deprecated - weighting is now inherent in bdiv metric name.) Take relative abundances into account. When weighted=FALSE, only presence/absence is considered. Multiple values allowed. Default: NULL

normalized

(Deprecated - normalization is now inherent in bdiv metric name.) Only changes the "Weighted UniFrac" calculation. Divides result by the total branch weights. Default: NULL

tree

A phylo object representing the phylogenetic relationships of the taxa in biom. Only required when computing UniFrac distances. Default: biom$tree

md

Dataset field(s) to include in the output data frame, or '.all' to include all metadata fields. Default: '.all'

within, between

Dataset field(s) for intra- or inter- sample comparisons. Alternatively, dataset field names given elsewhere can be prefixed with '==' or '!=' to assign them to within or between, respectively. Default: NULL

delta

For numeric metadata, report the absolute difference in values for the two samples, for instance 2 instead of "10 vs 12". Default: TRUE

norm

Normalize the incoming counts. Options are:

'none': No transformation.
'percent': Relative abundance (sample abundances sum to 1).
'binary': Unweighted presence/absence (each count is either 0 or 1).
'clr': Centered log ratio.

Default: 'none'.

pseudocount

Value added to counts to handle zeros when norm = 'clr'. Ignored for other normalization methods. Default: NULL (emits a warning).

power

Scaling factor for the magnitude of differences between communities ($p$) when bdiv = 'minkowski'. Ignored for other beta diversity metrics. Default: 1.5

alpha

The alpha term to use in Generalized UniFrac. How much weight to give to relative abundances; a value between 0 and 1, inclusive. Setting alpha=1 is equivalent to Normalized UniFrac. Default: 0.5

transform

Transformation to apply to calculated values. Options are: c("none", "rank", "log", "log1p", "sqrt", "percent"). "rank" is useful for correcting for non-normally distributions before applying regression statistics. Default: "none"

ties

When transform="rank", how to rank identical values. Options are: c("average", "first", "last", "random", "max", "min"). See rank() for details. Default: "random"

seed

Random seed for permutations. Must be a non-negative integer. Default: 0

cpus

The number of CPUs to use. Set to NULL to use all available, or to 1 to disable parallel processing. Default: NULL

...

Not used.

Metadata Comparisons

Prefix metadata fields with == or != to limit comparisons to within or between groups, respectively. For example, stat.by = '==Sex' will run calculations only for intra-group comparisons, returning "Male" and "Female", but NOT "Female vs Male". Similarly, setting stat.by = '!=Body Site' will only show the inter-group comparisons, such as "Saliva vs Stool", "Anterior nares vs Buccal mucosa", and so on.

The same effect can be achieved by using the within and between parameters. stat.by = '==Sex' is equivalent to stat.by = 'Sex', within = 'Sex'.

Examples

Run this code

    library(rbiom)
    
    # Subset to four samples
    biom <- hmp50$clone()
    biom$counts <- biom$counts[,c("HMP18", "HMP19", "HMP20", "HMP21")]
    
    # Return in long format with metadata
    bdiv_table(biom, 'w_unifrac', md = ".all")
    
    # Only look at distances among the stool samples
    bdiv_table(biom, 'w_unifrac', md = c("==Body Site", "Sex"))
    
    # Or between males and females
    bdiv_table(biom, 'w_unifrac', md = c("Body Site", "!=Sex"))
    
    # All-vs-all matrix
    bdiv_matrix(biom, 'w_unifrac')
    
    # All-vs-all distance matrix
    dm <- bdiv_distmat(biom, 'w_unifrac')
    dm
    plot(hclust(dm))