Learn R Programming

castor (version 1.6.1)

tree_distance: Calculate the distance between two trees.

Description

Given two rooted phylogenetic trees with identical tips, calculate their difference using a distance metric.

Usage

tree_distance(treeA, treeB, tipsA2B=NULL, metric="RF", normalized=FALSE)

Arguments

treeA

A rooted tree of class "phylo". The root is assumed to be the unique node with no incoming edge.

treeB

A rooted tree of class "phylo", with the same number of tips as treeA. The root is assumed to be the unique node with no incoming edge.

tipsA2B

Optional integer vector of size Ntips, mapping treeA tip indices to treeB tip indices (i.e. tipsA2B[a] is the tip index in treeB corresponding to tip index a in treeA). The mapping must be one-to-one. If left unspecified, it is determined by matching tip labels between the two trees (this assumes that the same tip labels are used in both trees).

metric

Character, specifying the distance metric to be used. Currently only the Robinson-Foulds ("RF") metric is implemented, which is the number of clusters (sets of tips descending from a node) in either of the trees but not shared by both trees (Robinson and Foulds, 1981; Day, 1985). The Robinson-Foulds metric does not take into account branch lengths.

normalized

Logical, specifying whether the calculated distance should be normalized to be between 0 and 1. For the Robinson-Foulds metric, the distance will be normalized by dividing it by the total number of nodes in the two trees.

Value

A single non-negative number, representing the distance between the two trees.

Details

If the trees differ in theis tips, they must be pruned down to their common set of tips. If tips have different labels in the two trees, but are nevertheless equivalent, the mapping between the two trees must be provided using tipsA2B. The trees may include multi-furcations as well as mono-furcations (i.e. nodes with only one child).

Note that under some Robinson-Foulds variants the trees can be unrooted; in this present implementation trees must be rooted and the placement of the root influences the distance, following the definition by Day (1985).

The asymptotic average time complexity of this function is O(Nedges*log(Nedges)*log(log(Nedges))) for a balanced bifurcating tree.

References

Robinson, D. R., Foulds, L. R. (1981). Comparison of phylogenetic trees. Mathematical Biosciences. 53: 131-147.

Day, W. H. E. (1985). Optimal algorithms for comparing trees with labeled leaves. Journal of Classification. 2:7-28.

See Also

congruent_divergence_times

Examples

Run this code
# NOT RUN {
# generate a random tree
Ntips = 1000
treeA = generate_random_tree(list(birth_rate_intercept=1),
                            max_tips=Ntips)$tree
                            
# create a second tree with slightly different topology
treeB = treeA
shuffled_tips = sample.int(Ntips, size=Ntips/10, replace=FALSE)
treeB$tip.label[shuffled_tips] = treeB$tip.label[sample(shuffled_tips)]

# calculate Robinson-Foulds distance between trees
distance = tree_distance(treeA, treeB, metric="RF")
# }

Run the code above in your browser using DataLab