Learn R Programming

MicrobTiSDA (version 0.1.0)

Data.visual.MESR: Visualize Group-Level OTU Temporal Profiles from Clustered Predicted Data

Description

This function visualizes the temporal patterns of microbial features at the group level, specifically tailored for data derived from mixed-effects spline regression (MESR) analyses. It leverages clustering results to segregate features into clusters based on their temporal trends, and then generates smoothed time-series plots for each cluster.

Usage

Data.visual.MESR(
  cluster_results,
  cutree_by = "height",
  cluster_height = NA,
  cluster_branches = NA,
  predicted_data,
  Design_data,
  pre_processed_data,
  Taxa = NULL,
  plot_dots = TRUE,
  figure_x_scale = 10,
  plot_lm = TRUE,
  lm_R2 = 0.01,
  lm_abs_slope = 0.005,
  title_size = 10,
  axis_title_size = 8,
  axis_y_size = 5,
  axis_x_size = 5,
  lm_sig_size = 5,
  legend_title_size = 5,
  legend_text_size = 5,
  dots_size = 0.7
)

Value

An object of class MicrobTiSDA.MSERvisual which contains lists of ggplot2 objects, where each top-level element corresponds to a group and each sub-element corresponds to a cluster within that group. Each plot visualizes the temporal profiles of microbial features in that cluster.

Arguments

cluster_results

A list object output from the Data.cluster).

cutree_by

A character string specifying the method to cut the dendrogram, either by "height" or by "branches".

cluster_height

A numeric vector specifying the cut-off height for each group when cutree_by = "height".

cluster_branches

A numeric vector specifying the number of clusters for each group when cutree_by = "branches".

predicted_data

The output data frame from the Pred.data.MESR).

Design_data

The output data from the Design).

pre_processed_data

The transformed data output from the Data.trans function. A pre-processed OTU data frame with sample IDs as row names and OTU IDs as column names.

Taxa

A data frame providing taxonomic annotations for microbial species.

plot_dots

Logical; if TRUE, raw data points are overlaid on the temporal curves (default: TRUE).

figure_x_scale

A numeric value specifying the interval for x-axis breaks in the figures (default: 5).

plot_lm

Logical; if TRUE, a linear model is fitted to the predicted data to detect trends, and the regression line is added (default: FALSE).

lm_R2

A numeric threshold for the minimum R-squared value required to annotate the linear model (default: 0.01).

lm_abs_slope

A numeric threshold for the minimum absolute slope required to annotate the linear model (default: 0.005).

title_size

A numeric value specifying the font size for the plot title (default: 10).

axis_title_size

A numeric value specifying the font size for the axis titles (default: 8).

axis_y_size

A numeric value specifying the font size for the y-axis text (default: 5).

axis_x_size

A numeric value specifying the font size for the x-axis text (default: 5).

lm_sig_size

A numeric value specifying the font size for linear model annotation text (default: 5).

legend_title_size

A numeric value specifying the font size for legend titles (default: 5).

legend_text_size

A numeric value specifying the font size for legend text (default: 5).

dots_size

A numeric value specifying the size of the overlaid raw data points (default: 0.7).

Details

The function begins by selecting branches from hierarchical clustering objects (provided in cluster_results) using either a specified cut-off height or a predefined number of clusters, as determined by the cutree_by parameter. For each group, it extracts the corresponding raw data from Design_data and determines the y-axis limits based on both the pre-processed data and the predicted data. Then, for each cluster within a group, the function subsets the predicted data to include only those features belonging to that cluster. If taxonomic annotation data (Taxa) is provided, feature names are augmented with species-level labels. The data is then reshaped into a long format and plotted using ggplot2, where smoothed curves (via stat_smooth) depict the predicted temporal profiles. Optionally, raw data points can be overlaid (if plot_dots is TRUE), and a linear model is fitted to each cluster’s data to test for significant trends. When the linear model meets criteria based on p-value (< 0.05), R² (greater than lm_R2), and a minimum absolute slope (greater than lm_abs_slope), a dashed regression line is added with an annotation indicating the trend direction (upward or downward) along with the R² and slope values. Various parameters allow customization of plot appearance including axis scales, font sizes, and legend properties.