align_dendro: Reorder or Group layout based on hierarchical clustering

Description

Reorder or Group layout based on hierarchical clustering

Usage

align_dendro(
  mapping = aes(),
  ...,
  distance = "euclidean",
  method = "complete",
  use_missing = "pairwise.complete.obs",
  reorder_dendrogram = FALSE,
  merge_dendrogram = FALSE,
  reorder_group = FALSE,
  k = NULL,
  h = NULL,
  plot_dendrogram = TRUE,
  plot_cut_height = NULL,
  root = NULL,
  center = FALSE,
  type = "rectangle",
  size = NULL,
  free_guides = waiver(),
  free_spaces = waiver(),
  plot_data = waiver(),
  theme = waiver(),
  free_labs = waiver(),
  data = NULL,
  set_context = NULL,
  order = NULL,
  name = NULL
)

Value

A new Align object.

Arguments

mapping

Additional default list of aesthetic mappings to use for plot.

...

<dyn-dots> Additional arguments passed to geom_segment().

distance

A string of distance measure to be used. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". Correlation coefficient can be also used, including "pearson", "spearman" or "kendall". In this way, 1 - cor will be used as the distance. In addition, you can also provide a dist object directly or a function return a dist object. Use NULL, if you don't want to calculate the distance.

method

A string of the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). You can also provide a function which accepts the distance and returns a hclust object. Alternative, you can supply an object which can be coerced to hclust.

use_missing

An optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs". Only used when distance is a correlation coefficient string.

reorder_dendrogram

A single boolean value, indicates whether we should reorder the dendrogram based on the means. Default: FALSE.

merge_dendrogram

A single boolean value, indicates whether we should merge multiple dendrograms, only used when previous groups have been established. Default: FALSE.

reorder_group

A single boolean value, indicates whether we should do Hierarchical Clustering between groups, only used when previous groups have been established. Default: FALSE.

k

An integer scalar indicates the desired number of groups.

h

A numeric scalar indicates heights where the tree should be cut.

plot_dendrogram

A boolean value indicates whether plot the dendrogram tree.

plot_cut_height

A boolean value indicates whether plot the cut height.

root

A length one string or numeric indicates the root branch.

center

A boolean value. if TRUE, nodes are plotted centered with respect to the leaves in the branch. Otherwise (default), plot them in the middle of all direct child nodes.

type

A string indicates the plot type, "rectangle" or "triangle".

size

Plot size, can be an unit object.

free_guides

Override the guides argument specified in the layout for a plot. Options include:

waiver(): inherits behavior from the layout.
NULL: no guide legends will be collected for the plot.
A string containing one or more of "t", "l", "b", and "r" indicates which side of guide legends should be collected for the plot..

free_spaces

A string with one or more of "t", "l", "b", and "r" indicating which border spaces should be removed. Defaults to waiver(), which inherits from the parent layout. If no parent, the default is NULL, meaning no spaces are removed.

plot_data

A function to transform plot data before rendering. Defaults to waiver(), which inherits from the parent layout. If no parent layout, the default is NULL, meaning the data won't be modified.

Used to modify the data after layout has been created, which should be a data frame, but before the data is handled of to the ggplot2 for rendering. Use this hook if the you needs change the default data for all geoms.

theme

Default plot theme: One of:

waiver(): will inherit from the parent layout.
NULL: Use the default theme.
theme(): will be added with the parent layout theme.

Note: The axis title and labels parallel to the layout axis will always be removed by default. For vertical stack layouts, this refers to the x-axis, and for horizontal stack layouts, this refers to the y-axis. If you want to display the axis title or labels, you should manually add theme() elements for the parallel axis title or labels.

free_labs

A string with one or more of "t", "l", "b", and "r" indicating which axis titles should be free from alignment. Defaults to waiver(), which inherits from the parent layout. If no parent layout, no axis titles will be aligned. If NULL, all axis titles will be aligned.

data

A matrix, data frame, or a simple vector. If an atomic vector is provided, it will be converted into a one-column matrix. When data = NULL, the internal layout data will be used by default. Additionally, data can be a function (including purrr-like lambdas), which will be applied to the layout data.

It is important to note that we consider the rows as the observations. It means the NROW(data) must return the same number with the specific layout axis (meaning the x-axis for vertical stack layout, or y-axis for horizontal stack layout).

heatmap_layout(): for column annotation, the layout data will be transposed before using (If data is a function, it will be applied with the transposed matrix). This is necessary because column annotation uses heatmap columns as observations, but we need rows.
stack_layout(): the layout data will be used as it is since we place all plots along a single axis.

set_context

A single boolean value indicates whether to set the active context to current plot. If TRUE, all subsequent ggplot elements will be added into this plot.

order

An single integer for the plot area order.

name

A string of the plot name. Used to switch the active context in hmanno() or stack_active().

ggplot2 specification

align_dendro initializes a ggplot data and mapping.

The internal will always use a default mapping of aes(x = .data$x, y = .data$y).

The default ggplot data is the node coordinates, in addition, a geom_segment layer with a data of the tree segments edge coordinates will be added.

node and tree segments edge coordinates contains following columns:

index: the original index in the tree for the current node
label: node label text
x and y: x-axis and y-axis coordinates for current node or the start node of the current edge.
xend and yend: the x-axis and y-axis coordinates of the terminal node for current edge.
branch: which branch current node or edge is. You can use this column to color different groups.
panel: which panel current node is, if we split the plot into panel using facet_grid, this column will show which panel current node or edge is from. Note: some nodes may fall outside panel (between two panel), so there are possible NA values in this column.
.panel: Similar with panel column, but always give the correct branch for usage of the ggplot facet.
panel1 and panel2: The panel1 and panel2 variables have the same functionality as panel, but they are specifically for the edge data and correspond to both nodes of each edge.
leaf: A logical value indicates whether current node is a leaf.

Examples

Run this code

ggheatmap(matrix(rnorm(81), nrow = 9)) +
    hmanno("top") +
    align_dendro()
ggheatmap(matrix(rnorm(81), nrow = 9)) +
    hmanno("top") +
    align_dendro(k = 3L)