TreeDist (version 1.1.1)

MeilaVariationOfInformation: Use variation of clustering information to compare pairs of splits

Description

Compare a pair of splits viewed as clusterings of taxa, using the variation of clustering information proposed by Meilă (2007).

Usage

MeilaVariationOfInformation(split1, split2)

MeilaMutualInformation(split1, split2)

Arguments

split1, split2

Logical vectors listing leaves in a consistent order, identifying each leaf as a member of the ingroup (TRUE) or outgroup (FALSE) of the split in question.

Value

MeilaVariationOfInformation() returns the variation of (clustering) information, measured in bits.

MeilaMutualInformation() returns the mutual information, measured in bits.

Details

This is equivalent to the mutual clustering information (Vinh et al. 2010). For the total information content, multiply the VoI by the number of leaves.

References

Meila2007TreeDist

Vinh2010TreeDist

Examples

Run this code
# NOT RUN {
# Maximum variation = information content of each split separately
A <- TRUE
B <- FALSE
MeilaVariationOfInformation(c(A, A, A, B, B, B), c(A, A, A, A, A, A))
Entropy(c(3, 3) / 6) + Entropy(c(0, 6) / 6)

# Minimum variation = 0
MeilaVariationOfInformation(c(A, A, A, B, B, B), c(A, A, A, B, B, B))

# Not always possible for two evenly-sized splits to reach maximum
# variation of information
Entropy(c(3, 3) / 6) * 2  # = 2
MeilaVariationOfInformation(c(A, A, A,B ,B, B), c(A, B, A, B, A, B)) # < 2

# Phylogenetically uninformative groupings contain spliting information
Entropy(c(1, 5) / 6)
MeilaVariationOfInformation(c(B, A, A, A, A, A), c(A, A, A, A, A, B))
# }

Run the code above in your browser using DataLab