JaccardRobinsonFoulds: Jaccard-Robinson-Foulds metric

Description

Calculate the Jaccard-Robinson-Foulds metric (B<U+00F6>cker et al. 2013), a Generalized Robinson-Foulds metric.

Usage

JaccardRobinsonFoulds(
  tree1,
  tree2 = tree1,
  k = 1L,
  allowConflict = TRUE,
  similarity = FALSE,
  normalize = FALSE,
  reportMatching = FALSE
)
JaccardSplitSimilarity(
  splits1,
  splits2,
  nTip = attr(splits1, "nTip"),
  k = 1L,
  allowConflict = TRUE,
  reportMatching = FALSE
)

Arguments

tree1

Trees of class phylo, with leaves labelled identically, or lists of such trees to undergo pairwise comparison.

tree2

Trees of class phylo, with leaves labelled identically, or lists of such trees to undergo pairwise comparison.

An arbitrary exponent to which to raise the Jaccard index. Integer values greater than one are anticipated by B<U+00F6>cker et al. The Nye et al. metric uses k = 1. As k increases towards infinity, the metric converges to the Robinson-Foulds metric.

allowConflict

Logical specifying whether to allow conflicting splits to be paired. If FALSE, such pairings will be allocated a similarity score of zero.

similarity

Logical specifying whether to report the result as a tree similarity, rather than a difference.

normalize

If a numeric value is provided, this will be used as a maximum value against which to rescale results. If TRUE, results will be rescaled against a maximum value calculated from the specified tree sizes and topology, as specified in the 'Normalization' section below. If FALSE, results will not be rescaled.

reportMatching

Logical specifying whether to return the clade matchings as an attribute of the score.

splits1

Logical matrices where each row corresponds to a leaf, either listed in the same order or bearing identical names (in any sequence), and each column corresponds to a split, such that each leaf is identified as a member of the ingroup (TRUE) or outgroup (FALSE) of the respective split.

splits2

nTip

(Optional) Integer specifying the number of leaves in each split.

Value

JaccardRobinsonFoulds() returns an array of numerics providing the distances between each pair of trees in tree1 and tree2, or splits1 and splits2.

Normalization

If normalize = TRUE, then results will be rescaled from zero to one by dividing by the maximum possible value for trees of the given topologies, which is equal to the sum of the number of splits in each tree. You may wish to normalize instead against the maximum number of splits present in a pair of trees with n leaves, by specifying normalize = n - 3.

Details

In short, the Jaccard-Robinson-Foulds metric is a generalized Robinson-Foulds metric: it finds the optimal matching that pairs each split in one tree with a similar split in the second. Matchings are scored according to the size of the largest split that is consistent with both of them, normalized against the Jaccard index, and raised to an arbitrary exponent. A more detailed explanation is provided in the vignettes.

By default, conflicting splits may be paired.

Note that the settings k = 1, allowConflict = TRUE, similarity = TRUE give the similarity metric of Nye et al. (2006); a slightly faster implementation of this metric is available as NyeSimilarity().

The examples section below details how to visualize matchings with non-default parameter values.

References

Nye2006TreeDist
Bocker2013TreeDist

Examples

Run this code

# NOT RUN {
set.seed(2)
tree1 <- ape::rtree(10)
tree2 <- ape::rtree(10)
JaccardRobinsonFoulds(tree1, tree2, k = 2, allowConflict = FALSE)
JaccardRobinsonFoulds(tree1, tree2, k = 2, allowConflict = TRUE)

JRF2 <- function (tree1, tree2, ...) 
  JaccardRobinsonFoulds(tree1, tree2, k = 2, allowConflict = FALSE, ...)
  
VisualizeMatching(JRF2, tree1, tree2, matchZeros = FALSE)
# }

Run the code above in your browser using DataLab