Learn R Programming

TreeSearch (version 0.3.2)

MutualArborealInfo: Information-based generalized Robinson-Foulds distance between two trees

Description

Functions reporting the distances or similarities between pairs of trees, based on information-theoretic concepts.

Usage

MutualArborealInfo(tree1, tree2, reportMatching = FALSE)

VariationOfArborealInfo(tree1, tree2, reportMatching = FALSE)

MutualClusterInfo(tree1, tree2, reportMatching = FALSE, bestMatchOnly = TRUE)

MutualArborealInfoSplits(splits1, splits2, reportMatching = FALSE)

VariationOfArborealInfoSplits(splits1, splits2, reportMatching = FALSE)

MutualClusterInfoSplits(splits1, splits2, reportMatching = FALSE, bestMatchOnly = TRUE, partitionQualityIndex = SplitPairingInformationIndex(dim(splits1)[1]))

Arguments

tree1, tree2

Trees of class phylo, with tips labelled identically, or lists of such trees to undergo pairwise comparison.

reportMatching

Logical specifying whether to return the clade matchings as an attribute of the score.

bestMatchOnly

Logical specifying whether to return how informative each split is about its best match only (TRUE) or how informative each split is about each other split (FALSE).

splits1, splits2

Logical matrices where each row corresponds to a terminal, either listed in the same order or bearing identical names (in any sequence), and each column corresponds to a bipartition split, such that each terminal is identified as a member of the ingroup (TRUE) or outgroup (FALSE) of the respective bipartition split.

partitionQualityIndex

Output of SplitPairingInformationIndex for n taxa; calculated automatically if not specified, but passing a cached value may improve performance.

Value

If reportMatching = FALSE, the functions return a numeric vector specifying the requested similarities or differences.

If reportMatching = TRUE, the functions additionally return

Functions

  • VariationOfArborealInfo: Variation of phylogenetic information between two trees

  • MutualClusterInfo: Mutual clustering information between two trees

  • MutualArborealInfoSplits: Takes splits instead of trees

  • VariationOfArborealInfoSplits: Calculate variation of arboreal information from splits

  • MutualClusterInfoSplits: Takes splits instead of trees

Details

Each partition in a tree can be viewed either as

  • (a) a statement that the 'true' tree is one of those that splits the taxa as specified;

  • (b) a statement that the taxa are subdivided into the two groups specified.

The former concept corresponds to the concept of phylogenetic information, and views the information content of a pair of partitions as relating to the proportion of phylogenetic trees that are consistent with both partitions, giving rise to the Mutual Arboreal Information similarity measure (MutualArborealInfo), and the complementary Variation of Arboreal Information distance metric (VariationOfArborealInfo).

The latter sees the information content of a pair of partitions as relating to the proportion of all possible pairings that are at least as similar (measured using Meila's (2007) the variation of information) as the pairing in question, giving rise to the Mutual Clustering Information similarity measure (MutualClusterInfo).

A tree similarity measure is generated by finding an optimal matching that maximises the total information in common between a partition on one tree and its pair on a second, considering all possible ways to pair partitions between trees (including leaving a partition unpaired).

The returned tree similarity measures state the amount of information, in bits, that the partitions in two trees hold in common when they are optimally matched, following Smith (forthcoming). The complementary tree distance measures state how much information is different in the partitions of two trees, under an optimal matching.

References

  • Meila2007TreeSearch

  • SmithDistTreeSearch

  • Vinh2010TreeSearch

Examples

Run this code
# NOT RUN {
{
  tree1 <- ape::read.tree(text='((((a, b), c), d), (e, (f, (g, h))));')
  tree2 <- ape::read.tree(text='(((a, b), (c, d)), ((e, f), (g, h)));')
  tree3 <- ape::read.tree(text='((((h, b), c), d), (e, (f, (g, a))));')
  
  # Best possible score is obtained by matching a tree with itself
  VariationOfArborealInfo(tree1, tree1) # 0, by definition
  MutualArborealInfo(tree1, tree1)
  
  # Best possible score is a function of tree shape; the partitions within
  # balanced trees are more independent and thus contain less information
  MutualArborealInfo(tree2, tree2)
  
  # How similar are two trees?
  MutualArborealInfo(tree1, tree2)
  VariationOfArborealInfo(tree1, tree2)
  VariationOfArborealInfo(tree2, tree1) # Identical, by symmetry
  
  
  # Maximum possible score for Cluster information is independent
  # of tree shape, as every possible pairing is considered
  MutualClusterInfo(tree1, tree1)
  MutualClusterInfo(tree2, tree2)
  
  # It is thus easier to interpret the value of
  MutualClusterInfo(tree1, tree2)
  # Although it may not be possible to find a tree pair with zero mutual
  # cluster info.
  
  # Every partition in tree1 is contradicted by every partition in tree3
  # Non-arboreal matches contain clustering, but not phylogenetic, information
  MutualArborealInfo(tree1, tree3) # = 0
  MutualClusterInfo(tree1, tree3) # > 0
  
}

# }

Run the code above in your browser using DataLab