Learn R Programming

OmicFlow (version 1.5.0)

jaccard: Compute Jaccard Dissimilarity from a Dense or Sparse Matrix.

Description

Calculates the Jaccard dissimilarity of a Matrix pairwise for each column.

Usage

jaccard(x, weighted = TRUE, threads = 1)

Value

A column x column dist object.

Arguments

x

A matrix, sparseMatrix or Matrix.

weighted

A boolean value, to use abundances (weighted = TRUE) or absence/presence (weighted=FALSE) (default: TRUE).

threads

A wholenumber, the number of threads to use in setThreadOptions (default: 1).

Details

The weighted Jaccard disimilarity between two samples \(A\) and \(B\), each of length \(n\), is defined as:

\(d(A,B) = 1 - \frac{ \sum_{i}^{n} \min(A_i, B_i) }{ \sum_{i}^{n} \max(A_i, B_i) }\)

where \(A_i\) and \(B_i\) are the abundances of the \(i\)-th feature in sample \(A\) and \(B\), respectively. When weighted is set to FALSE, abundances are changed to 1 (classical Jaccard for binary data).

References

Jaccard, P. (1912) The distribution of the flora in the alpine zone. New Phytologist, 11(2), 37–50. library("OmicFlow")

metadata_file <- system.file("extdata", "metadata.tsv", package = "OmicFlow") counts_file <- system.file("extdata", "counts.tsv", package = "OmicFlow") features_file <- system.file("extdata", "features.tsv", package = "OmicFlow") tree_file <- system.file("extdata", "tree.newick", package = "OmicFlow")

taxa <- metagenomics$new( metaData = metadata_file, countData = counts_file, featureData = features_file, treeData = tree_file )

taxa$feature_subset(Kingdom == "Bacteria") taxa$normalize()

jaccard(taxa$countData)