Data.visual.MESR: Visualize Group-Level OTU Temporal Profiles from Clustered Predicted Data

Description

This function visualizes the temporal patterns of microbial features at the group level, specifically tailored for data derived from mixed-effects spline regression (MESR) analyses. It leverages clustering results to segregate features into clusters based on their temporal trends, and then generates smoothed time-series plots for each cluster.

Usage

Data.visual.MESR(
  cluster_results,
  cutree_by = "height",
  cluster_height = NA,
  cluster_branches = NA,
  predicted_data,
  Design_data,
  pre_processed_data,
  Taxa = NULL,
  plot_dots = TRUE,
  figure_x_scale = 10,
  plot_lm = TRUE,
  lm_R2 = 0.01,
  lm_abs_slope = 0.005,
  title_size = 10,
  axis_title_size = 8,
  axis_y_size = 5,
  axis_x_size = 5,
  lm_sig_size = 5,
  legend_title_size = 5,
  legend_text_size = 5,
  dots_size = 0.7
)

Value

An object of class MicrobTiSDA.MSERvisual which contains lists of ggplot2 objects, where each top-level element corresponds to a group and each sub-element corresponds to a cluster within that group. Each plot visualizes the temporal profiles of microbial features in that cluster.

Arguments

cluster_results: A list object output from the Data.cluster).
cutree_by: A character string specifying the method to cut the dendrogram, either by "height" or by "branches".
cluster_height: A numeric vector specifying the cut-off height for each group when cutree_by = "height".
cluster_branches: A numeric vector specifying the number of clusters for each group when cutree_by = "branches".
predicted_data: The output data frame from the Pred.data.MESR).
Design_data: The output data from the Design).
pre_processed_data: The transformed data output from the Data.trans function. A pre-processed OTU data frame with sample IDs as row names and OTU IDs as column names.
Taxa: A data frame providing taxonomic annotations for microbial species.
plot_dots: Logical; if TRUE, raw data points are overlaid on the temporal curves (default: TRUE).
figure_x_scale: A numeric value specifying the interval for x-axis breaks in the figures (default: 5).
plot_lm: Logical; if TRUE, a linear model is fitted to the predicted data to detect trends, and the regression line is added (default: FALSE).
lm_R2: A numeric threshold for the minimum R-squared value required to annotate the linear model (default: 0.01).
lm_abs_slope: A numeric threshold for the minimum absolute slope required to annotate the linear model (default: 0.005).
title_size: A numeric value specifying the font size for the plot title (default: 10).
axis_title_size: A numeric value specifying the font size for the axis titles (default: 8).
axis_y_size: A numeric value specifying the font size for the y-axis text (default: 5).
axis_x_size: A numeric value specifying the font size for the x-axis text (default: 5).
lm_sig_size: A numeric value specifying the font size for linear model annotation text (default: 5).
legend_title_size: A numeric value specifying the font size for legend titles (default: 5).
legend_text_size: A numeric value specifying the font size for legend text (default: 5).
dots_size: A numeric value specifying the size of the overlaid raw data points (default: 0.7).

Details

The function begins by selecting branches from hierarchical clustering objects (provided in cluster_results) using either a specified cut-off height or a predefined number of clusters, as determined by the cutree_by parameter. For each group, it extracts the corresponding raw data from Design_data and determines the y-axis limits based on both the pre-processed data and the predicted data. Then, for each cluster within a group, the function subsets the predicted data to include only those features belonging to that cluster. If taxonomic annotation data (Taxa) is provided, feature names are augmented with species-level labels. The data is then reshaped into a long format and plotted using ggplot2, where smoothed curves (via stat_smooth) depict the predicted temporal profiles. Optionally, raw data points can be overlaid (if plot_dots is TRUE), and a linear model is fitted to each cluster’s data to test for significant trends. When the linear model meets criteria based on p-value (< 0.05), R² (greater than lm_R2), and a minimum absolute slope (greater than lm_abs_slope), a dashed regression line is added with an annotation indicating the trend direction (upward or downward) along with the R² and slope values. Various parameters allow customization of plot appearance including axis scales, font sizes, and legend properties.