Calculates the Jaccard dissimilarity of a Matrix pairwise for each column.
jaccard(x, weighted = TRUE, threads = 1)A column x column dist object.
A matrix, sparseMatrix or Matrix.
A boolean value, to use abundances (weighted = TRUE) or absence/presence (weighted=FALSE) (default: TRUE).
A wholenumber, the number of threads to use in setThreadOptions (default: 1).
The weighted Jaccard disimilarity between two samples \(A\) and \(B\), each of length \(n\), is defined as:
\(d(A,B) = 1 - \frac{ \sum_{i}^{n} \min(A_i, B_i) }{ \sum_{i}^{n} \max(A_i, B_i) }\)
where \(A_i\) and \(B_i\) are the abundances of the \(i\)-th feature in sample \(A\) and \(B\), respectively. When weighted is set to FALSE, abundances are changed to 1 (classical Jaccard for binary data).
Jaccard, P. (1912) The distribution of the flora in the alpine zone. New Phytologist, 11(2), 37–50. library("OmicFlow")
metadata_file <- system.file("extdata", "metadata.tsv", package = "OmicFlow") counts_file <- system.file("extdata", "counts.tsv", package = "OmicFlow") features_file <- system.file("extdata", "features.tsv", package = "OmicFlow") tree_file <- system.file("extdata", "tree.newick", package = "OmicFlow")
taxa <- metagenomics$new( metaData = metadata_file, countData = counts_file, featureData = features_file, treeData = tree_file )
taxa$feature_subset(Kingdom == "Bacteria") taxa$normalize()
jaccard(taxa$countData)