Learn R Programming

bioregion (version 1.3.0)

cut_tree: Cut a hierarchical tree

Description

This function is designed to work on a hierarchical tree and cut it at user-selected heights. It works with outputs from either hclu_hierarclust or hclust objects. The function allows for cutting the tree based on the chosen number(s) of clusters or specified height(s). Additionally, it includes a procedure to automatically determine the cutting height for the requested number(s) of clusters.

Usage

cut_tree(
  tree,
  n_clust = NULL,
  cut_height = NULL,
  find_h = TRUE,
  h_max = 1,
  h_min = 0,
  dynamic_tree_cut = FALSE,
  dynamic_method = "tree",
  dynamic_minClusterSize = 5,
  dissimilarity = NULL,
  show_hierarchy = FALSE,
  verbose = TRUE,
  ...
)

Value

If tree is an output from hclu_hierarclust(), the same object is returned with updated content (i.e., args and clusters). If tree is an hclust object, a data.frame containing the clusters is returned.

Arguments

tree

A bioregion.hierar.tree or an hclust object.

n_clust

An integer vector or a single integer indicating the number of clusters to be obtained from the hierarchical tree, or the output from bioregionalization_metrics(). This should not be used concurrently with cut_height.

cut_height

A numeric vector specifying the height(s) at which the tree should be cut. This should not be used concurrently with n_clust or optim_method.

find_h

A boolean indicating whether the cutting height should be determined for the requested n_clust.

h_max

A numeric value indicating the maximum possible tree height for determining the cutting height when find_h = TRUE.

h_min

A numeric value specifying the minimum possible height in the tree for determining the cutting height when find_h = TRUE.

dynamic_tree_cut

A boolean indicating whether the dynamic tree cut method should be used. If TRUE, n_clust and cut_height are ignored.

dynamic_method

A character string specifying the method to be used for dynamically cutting the tree: either "tree" (clusters searched only within the tree) or "hybrid" (clusters searched in both the tree and the dissimilarity matrix).

dynamic_minClusterSize

An integer indicating the minimum cluster size for the dynamic tree cut method (see dynamicTreeCut::cutreeDynamic()).

dissimilarity

Relevant only if dynamic_method = "hybrid". Provide the dissimilarity data.frame used to build the tree.

show_hierarchy

A boolean specifying if the hierarchy of clusters should be identifiable in the outputs (FALSE by default).

verbose

A boolean indicating whether to display progress messages. Set to FALSE to suppress these messages.

...

Additional arguments passed to dynamicTreeCut::cutreeDynamic() to customize the dynamic tree cut method.

Author

Pierre Denelle (pierre.denelle@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Boris Leroy (leroy.boris@gmail.com)

Details

The function supports two main methods for cutting the tree. First, the tree can be cut at a uniform height (specified by cut_height or determined automatically for the requested n_clust). Second, the dynamic tree cut method (Langfelder et al., 2008) can be applied, which adapts to the shape of branches in the tree, cutting at varying heights based on cluster positions.

The dynamic tree cut method has two variants:

  • The tree-based variant (dynamic_method = "tree") uses a top-down approach, relying solely on the tree and the order of clustered objects.

  • The hybrid variant (dynamic_method = "hybrid") employs a bottom-up approach, leveraging both the tree and the dissimilarity matrix to identify clusters based on dissimilarity among sites. This approach is useful for detecting outliers within clusters.

References

Langfelder P, Zhang B & Horvath S (2008) Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. BIOINFORMATICS 24, 719-720.

See Also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html.

Associated functions: hclu_hierarclust

Examples

Run this code
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site", 1:20)
colnames(comat) <- paste0("Species", 1:25)

simil <- similarity(comat, metric = "all")
dissimilarity <- similarity_to_dissimilarity(simil)

# User-defined number of clusters
tree1 <- hclu_hierarclust(dissimilarity,
                          n_clust = 5)
tree2 <- cut_tree(tree1, cut_height = .05)
tree3 <- cut_tree(tree1, n_clust = c(3, 5, 10))
tree4 <- cut_tree(tree1, cut_height = c(.05, .1, .15, .2, .25))
tree5 <- cut_tree(tree1, n_clust = c(3, 5, 10), find_h = FALSE)

hclust_tree <- tree2$algorithm$final.tree
clusters_2 <- cut_tree(hclust_tree, n_clust = 10)

cluster_dynamic <- cut_tree(tree1, dynamic_tree_cut = TRUE,
                            dissimilarity = dissimilarity)

Run the code above in your browser using DataLab