Data.cluster.cut: Cut and Annotate Dendrogram Based on a Specified Cut-Off Height

Description

This function processes clustering outputs from Data.cluster to update dendrogram plots. Depending on the user’s preference, it either automatically determines the optimal number of clusters via silhouette analysis or uses a user-specified cut-off height.

Usage

Data.cluster.cut(
  cluster_outputs,
  cut_height,
  cut_height_dist = 0.2,
  font_size = 0.2,
  auto_cutree = FALSE
)

Value

A object of MicrobTiSDA.clusterCut with two elements:

cluster_results: A list of clustering objects for each group.
cluster_figures: A list of ggplot2 objects containing the annotated dendrogram plots for each group.

Arguments

cluster_outputs: The output object of Data.cluster.
cut_height: A numeric value specifying the cut-off height for cutting the dendrogram when auto_cutree is FALSE.
cut_height_dist: A numeric value used to adjust the vertical distance of the cut-off line annotation in the dendrogram plot (default: 0.2).
font_size: A numeric value specifying the font size for text labels in the dendrogram plots (default: 0.2).
auto_cutree: Logical; if TRUE, the function automatically determines the optimal number of clusters based on silhouette width (default: FALSE).

Author

Shijia Li

Details

The function takes as input a list containing predicted data and clustering results (typically generated by another function) from Data.cluster, and then computes a correlation-based distance matrix for each group. If auto_cutree is TRUE, the function performs a repeated k-fold cross-validation by iterating over a range of potential cluster numbers and computing the average silhouette width, thereby determining the optimal number of clusters. The dendrogram is then cut accordingly, and the resulting clusters are used to annotate the dendrogram plot with different colors for each cluster.

If auto_cutree is FALSE, the function uses the provided cut_height to cut the dendrogram. It then assigns cluster emberships based on this cut-off and updates the dendrogram plot by adding a horizontal dashed line at the specified cut-off and annotating the plot with the cut-off value. In both cases, the function prints the dendrogram plot for each group and returns a list containing the clustering results and the corresponding ggplot2 objects of the dendrograms.

Examples

Run this code

# \donttest{
# Example metadata with grouping variables
metadata <- data.frame(
  TimePoint = c(1, 2, 3, 4),
  Sample = c('S1', 'S2', 'S3', 'S4'),
  GroupA = c('A', 'A', 'B', 'B'),
  GroupB = c('X', 'Y', 'X', 'Y')
)

# Example pre-processed data (e.g., transformed abundance data)
Pre_processed_Data <- data.frame(
  Feature1 = rnorm(4),
  Feature2 = rnorm(4)
)

# Create design matrix using grouping variables
design_data <- Design(metadata, Group_var = c('GroupA', 'GroupB'), Pre_processed_Data,
                      Sample_Time = 'TimePoint', Sample_ID = 'Sample')

reg <- Reg.SPLR(design_data,
                  Pre_processed_Data,
                  z_score = 2,
                  unique_values = 5,
                  Knots = NULL,
                  max_Knots = 5)
predictions <- Pred.data(reg,
                        metadata,
                        Group = "GroupA",
                        time_step = 1,
                        Sample_Time = "TimePoint")
result <- Data.cluster(predicted_data = predictions,
                       clust_method = "average",
                       font_size = 0.2,
                       dend_title_size = 15)

result <- Data.cluster.cut(cluster_outputs = result,
                          cut_height = 0.3,
                          cut_height_dist = 0.2,
                          auto_cutree = FALSE)

# To automatically determine the optimal number of clusters:
result_auto <- Data.cluster.cut(cluster_outputs = result, auto_cutree = TRUE)
# }

Run the code above in your browser using DataLab