Learn R Programming

dbrobust (version 1.0.0)

plot_heatmap: Visualize a Distance or Similarity Matrix as a Heatmap with Clustering

Description

This function creates a heatmap from a square distance or similarity matrix. If a similarity matrix is provided, it should first be converted to a distance matrix by the user. The function supports hierarchical clustering, group annotations, row/column sampling (random or stratified), and various customization options.

Usage

plot_heatmap(
  dist_mat,
  max_n = 50,
  group = NULL,
  stratified_sampling = FALSE,
  main_title = NULL,
  palette = "YlOrRd",
  clustering_method = "complete",
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  fontsize_row = 10,
  fontsize_col = 10,
  show_rownames = TRUE,
  show_colnames = TRUE,
  border_color = "grey60",
  annotation_legend = TRUE,
  seed = 123
)

Value

Invisibly returns the pheatmap object, allowing further customization if assigned.

Arguments

dist_mat

A square distance matrix (numeric matrix) or a dist object.

max_n

Integer. Maximum number of observations (rows/columns) to display. If the matrix exceeds this size, a subset of max_n observations is selected.

group

Optional vector or factor providing group labels for rows/columns, used for color annotation.

stratified_sampling

Logical. If TRUE and group is provided, sampling is stratified by group. Each group will contribute at least one observation if possible. Default is FALSE.

main_title

Optional character string specifying the main title of the heatmap.

palette

Character string specifying the RColorBrewer palette for heatmap cells. Default is "YlOrRd".

clustering_method

Character string specifying the hierarchical clustering method, as accepted by hclust (e.g., "complete", "average", "ward.D2").

cluster_rows

Logical, whether to perform hierarchical clustering on rows. Default is TRUE.

cluster_cols

Logical, whether to perform hierarchical clustering on columns. Default is TRUE.

fontsize_row

Integer specifying the font size of row labels. Default is 10.

fontsize_col

Integer specifying the font size of column labels. Default is 10.

show_rownames

Logical, whether to display row names. Default is TRUE.

show_colnames

Logical, whether to display column names. Default is TRUE.

border_color

Color of the cell borders in the heatmap. Default is "grey60".

annotation_legend

Logical, whether to display the legend for group annotations. Default is TRUE.

seed

Integer. Random seed used when sampling rows/columns if max_n is smaller than total observations. Default is 123.

Details

The function works as follows:

  • Converts dist objects to matrices automatically.

  • Samples rows/columns if the matrix is larger than max_n. Sampling can be random or stratified by group.

  • In stratified sampling mode, each group contributes at least one observation if possible.

  • Supports row annotations for groups and automatically assigns colors.

  • Uses pheatmap for plotting with customizable clustering, labels, fonts, and colors.

This function is used internally by visualize_distances() but can be called directly for advanced usage.

See Also

hclust for hierarchical clustering methods. pheatmap for additional heatmap customization options. brewer.pal for available color palettes.

Examples

Run this code
# Example: Euclidean distance heatmap on iris
eucli_dist <- stats::dist(iris[, 1:4])
dbrobust::plot_heatmap(
  dist_mat = eucli_dist,
  max_n = 10,
  group = iris$Species,
  stratified_sampling = TRUE,
  main_title = "Euclidean Distance Heatmap",
  palette = "YlOrRd",
  clustering_method = "complete"
)

# Example: GGower distances with small subset
data("Data_HC_contamination", package = "dbrobust")
Data_small <- Data_HC_contamination[1:50, ]
cont_vars <- c("V1", "V2", "V3", "V4")
cat_vars  <- c("V5", "V6", "V7")
bin_vars  <- c("V8", "V9")
w <- Data_small$w_loop
dist_sq_ggower <- dbrobust::robust_distances(
  data = Data_small,
  cont_vars = cont_vars,
  bin_vars  = bin_vars,
  cat_vars  = cat_vars,
  w = w,
  alpha = 0.10,
  method = "ggower"
)
group_vec <- rep("Normal", nrow(dist_sq_ggower))
group_vec[attr(dist_sq_ggower, "outlier_idx")] <- "Outlier"
group_factor <- factor(group_vec, levels = c("Normal", "Outlier"))
dbrobust::plot_heatmap(
  dist_mat = sqrt(dist_sq_ggower),
  max_n = 20,
  group = group_factor,
  main_title = "GGower Heatmap with Outliers",
  palette = "YlOrRd",
  clustering_method = "complete",
  annotation_legend = TRUE,
  stratified_sampling = TRUE,
  seed = 123
)

Run the code above in your browser using DataLab