align_dendro: Reorder or Group observations based on hierarchical clustering

Description

This function aligns observations within the layout according to a hierarchical clustering tree, enabling reordering or grouping of elements based on clustering results.

Usage

align_dendro(
  mapping = aes(),
  ...,
  distance = "euclidean",
  method = "complete",
  use_missing = "pairwise.complete.obs",
  reorder_dendrogram = FALSE,
  merge_dendrogram = FALSE,
  reorder_group = FALSE,
  k = NULL,
  h = NULL,
  cutree = NULL,
  plot_dendrogram = TRUE,
  plot_cut_height = NULL,
  root = NULL,
  center = FALSE,
  type = "rectangle",
  size = NULL,
  data = NULL,
  no_axes = NULL,
  active = NULL,
  free_guides = deprecated(),
  free_spaces = deprecated(),
  plot_data = deprecated(),
  theme = deprecated(),
  free_labs = deprecated(),
  set_context = deprecated(),
  order = deprecated(),
  name = deprecated()
)

Value

A "AlignDendro" object.

Arguments

mapping: Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
...: <dyn-dots> Additional arguments passed to geom_segment().
distance: A string of distance measure to be used. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". Correlation coefficient can be also used, including "pearson", "spearman" or "kendall". In this way, 1 - cor will be used as the distance. In addition, you can also provide a dist object directly or a function return a dist object. Use NULL, if you don't want to calculate the distance.
method: A string of the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). You can also provide a function which accepts the calculated distance (or the input matrix if distance is NULL) and returns a hclust object. Alternative, you can supply an object which can be coerced to hclust.
use_missing: An optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs". Only used when distance is a correlation coefficient string.
reorder_dendrogram: A single boolean value indicating whether to reorder the dendrogram based on the means. Alternatively, you can provide a custom function that accepts an hclust object and the data used to generate the tree, returning either an hclust or dendrogram object. Default is FALSE.
merge_dendrogram: A single boolean value, indicates whether we should merge multiple dendrograms, only used when previous groups have been established. Default: FALSE.
reorder_group: A single boolean value, indicates whether we should do Hierarchical Clustering between groups, only used when previous groups have been established. Default: FALSE.
k: An integer scalar indicates the desired number of groups.
h: A numeric scalar indicates heights where the tree should be cut.
cutree: A function used to cut the hclust tree. It should accept four arguments: the hclust tree object, distance (only applicable when method is a string or a function for performing hierarchical clustering), k (the number of clusters), and h (the height at which to cut the tree). By default, cutree() is used.
plot_dendrogram: A boolean value indicates whether plot the dendrogram tree.
plot_cut_height: A boolean value indicates whether plot the cut height.
root: A length one string or numeric indicates the root branch.
center: A boolean value. if TRUE, nodes are plotted centered with respect to the leaves in the branch. Otherwise (default), plot them in the middle of all direct child nodes.
type: A string indicates the plot type, "rectangle" or "triangle".
size: The relative size of the plot, can be specified as a unit.
data: A matrix-like object. By default, it inherits from the layout matrix.
no_axes: Logical; if TRUE, removes axes elements for the alignment axis using theme_no_axes(). By default, will controled by the option- "ggalign.align_no_axes".
active: A active() object that defines the context settings when added to a layout.
free_guides: Please use plot_align() function instead.
free_spaces: Please use plot_align() function instead.
plot_data: Please use plot_data() function instead.
theme: Please use plot_theme() function instead.
free_labs: Please use plot_align() function instead.
set_context: Please use active argument instead.
order: Please use active argument instead.
name: Please use active argument instead.

ggplot2 specification

align_dendro initializes a ggplot data and mapping.

The internal will always use a default mapping of aes(x = .data$x, y = .data$y).

The default ggplot data is the node coordinates with edge data attached in ggalign attribute, in addition, a geom_segment layer with a data of the edge coordinates will be added.

node and tree segments edge coordinates contains following columns:

index: the original index in the tree for the current node
label: node label text
x and y: x-axis and y-axis coordinates for current node or the start node of the current edge.
xend and yend: the x-axis and y-axis coordinates of the terminal node for current edge.
branch: which branch current node or edge is. You can use this column to color different groups.
panel: which panel current node is, if we split the plot into panel using facet_grid, this column will show which panel current node or edge is from. Note: some nodes may fall outside panel (between two panel), so there are possible NA values in this column.
.panel: Similar with panel column, but always give the correct branch for usage of the ggplot facet.
panel1 and panel2: The panel1 and panel2 variables have the same functionality as panel, but they are specifically for the edge data and correspond to both nodes of each edge.
leaf: A logical value indicates whether current node is a leaf.

Axis Alignment for Observations

It is important to note that we consider rows as observations, meaning vec_size(data)/NROW(data) must match the number of observations along the axis used for alignment (x-axis for a vertical stack layout, y-axis for a horizontal stack layout).

quad_layout()/ggheatmap(): For column annotation, the layout matrix will be transposed before use (if data is a function, it is applied to the transposed matrix), as column annotation uses columns as observations but alignment requires rows.
stack_layout(): The layout matrix is used as is, aligning all plots along a single axis.

Examples

Run this code

ggheatmap(matrix(rnorm(81), nrow = 9)) +
    anno_top() +
    align_dendro()
ggheatmap(matrix(rnorm(81), nrow = 9)) +
    anno_top() +
    align_dendro(k = 3L)

Run the code above in your browser using DataLab