fluidity: Computing genomic fluidity for a pan-genome

Description

Computes the genomic fluidity, which is a measure of population diversity.

Usage

fluidity(pan.matrix, n.sim = 10)

Arguments

pan.matrix

A Panmat object, see panMatrix for details.

n.sim

An integer specifying the number of random samples to use in the computations.

Value

A list with two elements, the mean fluidity and its sample standard deviation over the n.sim computed values.

Details

The genomic fluidity between two genomes is defined as the number of unique gene families divided by the total number of gene families (Kislyuk et al, 2011). This is averaged over n.sim random pairs of genomes to obtain a population estimate.

The genomic fluidity between two genomes describes their degree of overlap with respect to gene cluster content. If the fluidity is 0.0, the two genomes contain identical gene clusters. If it is 1.0 the two genomes are non-overlapping. The difference between a Jaccard distance (see distJaccard) and genomic fluidity is small, they both measure overlap between genomes, but fluidity is computed for the population by averaging over many pairs, while Jaccard distances are computed for every pair. Note that only presence/absence of gene clusters are considered, not multiple occurrences.

The input pan.matrix is typically constructed by panMatrix.

References

Kislyuk, A.O., Haegeman, B., Bergman, N.H., Weitz, J.S. (2011). Genomic fluidity: an integrative view of gene diversity within microbial populations. BMC Genomics, 12:32.

Examples

Run this code

# Loading two Panmat objects in the micropan package
data(list=c("Mpneumoniae.blast.panmat","Mpneumoniae.domain.panmat"),package="micropan")

# Fluidity based on a BLAST clustering Panmat object
fluid.blast <- fluidity(Mpneumoniae.blast.panmat)

# Fluidity based on domain sequence clustering Panmat object
fluid.domains <- fluidity(Mpneumoniae.domain.panmat)

Run the code above in your browser using DataLab