micropan (version 1.1.2)

distJaccard: Computing Jaccard distances between genomes

Description

Computes the Jaccard distances between all pairs of genomes.

Usage

distJaccard(pan.matrix)

Arguments

pan.matrix
A Panmat object, see panMatrix for details.

Value

A dist object (see dist) containing all pairwise Jaccard distances between genomes.

Details

The Jaccard index between two sets is defined as the size of the interesection of the sets divided by the size of the union. The Jaccard distance is simply 1 minus the Jaccard index.

The Jaccard distance between two genomes describes their degree of overlap with respect to gene cluster content. If the Jaccard distance is 0.0, the two genomes contain identical gene clusters. If it is 1.0 the two genomes are non-overlapping. The difference between a genomic fluidity (see fluidity) and a Jaccard distance is small, they both measure overlap between genomes, but fluidity is computed for the population by averaging over many pairs, while Jaccard distances are computed for every pair. Note that only presence/absence of gene clusters are considered, not multiple occurrences.

The input pan.matrix is typically constructed by panMatrix.

See Also

panMatrix, fluidity, dist.

Examples

Run this code
# Loading two Panmat objects in the micropan package
data(list=c("Mpneumoniae.blast.panmat","Mpneumoniae.domain.panmat"),package="micropan")

# Jaccard distances based on a BLAST clustering Panmat object
Jdist.blast <- distJaccard(Mpneumoniae.blast.panmat)

# Jaccard distances based on domain sequence clustering Panmat object
Jdist.domains <- distJaccard(Mpneumoniae.domain.panmat) 

Run the code above in your browser using DataLab