Learn R Programming

MicrobTiSDA (version 0.1.0)

Data.visual: Visualize Temporal OTU Profiles from Clustered Predicted Data

Description

The Data.visual function generates visualizations of temporal profiles for OTUs by integrating clustering results, predicted time-series data, and design information. It produces ggplot2 figures for each group and for each cluster branch, displaying smoothed curves of predicted OTU abundances over time. Optionally, the function overlays raw data points and fits linear models to assess temporal trends, annotating the plots with model statistics when certain criteria are met.

Usage

Data.visual(
  cluster_results,
  cutree_by = "height",
  cluster_height = NA,
  cluster_branches = NA,
  predicted_data,
  Design_data,
  pre_processed_data,
  Taxa = NULL,
  plot_dots = TRUE,
  figure_x_scale = 5,
  plot_lm = FALSE,
  lm_R2 = 0.01,
  lm_abs_slope = 0.005,
  title_size = 10,
  axis_title_size = 8,
  axis_y_size = 5,
  axis_x_size = 5,
  lm_sig_size = 5,
  legend_title_size = 5,
  legend_text_size = 5,
  dots_size = 0.7
)

Value

An object of class MicrobTiSDA.visual which contains the list of visualizations of clustered microbial features.

Arguments

cluster_results

A list object output from the Data.cluster).

cutree_by

A character string specifying the method to cut the dendrogram, either by "height" or by "branches".

cluster_height

A numeric vector specifying the cut-off height for each group when cutree_by = "height".

cluster_branches

A numeric vector specifying the number of clusters for each group when cutree_by = "branches".

predicted_data

The output data frame from the Pred.data).

Design_data

The output data from the Design).

pre_processed_data

The transformed data output from the Data.trans function. A pre-processed OTU data frame with sample IDs as row names and OTU IDs as column names.

Taxa

A data frame providing taxonomic annotations for microbial species.

plot_dots

Logical; if TRUE, raw data points are overlaid on the temporal curves (default: TRUE).

figure_x_scale

A numeric value specifying the interval for x-axis breaks in the figures (default: 5).

plot_lm

Logical; if TRUE, a linear model is fitted to the predicted data to detect trends, and the regression line is added (default: FALSE).

lm_R2

A numeric threshold for the minimum R-squared value required to annotate the linear model (default: 0.01).

lm_abs_slope

A numeric threshold for the minimum absolute slope required to annotate the linear model (default: 0.005).

title_size

A numeric value specifying the font size for the plot title (default: 10).

axis_title_size

A numeric value specifying the font size for the axis titles (default: 8).

axis_y_size

A numeric value specifying the font size for the y-axis text (default: 5).

axis_x_size

A numeric value specifying the font size for the x-axis text (default: 5).

lm_sig_size

A numeric value specifying the font size for linear model annotation text (default: 5).

legend_title_size

A numeric value specifying the font size for legend titles (default: 5).

legend_text_size

A numeric value specifying the font size for legend text (default: 5).

dots_size

A numeric value specifying the size of the overlaid raw data points (default: 0.7).

Details

This function uses hierarchical clustering results (obtained from a dendrogram) to cut the tree either by a specified height or by a user specified number of branches of each dendrogram in cluster_results. For each group in cluster_results, the function extracts the corresponding predicted OTU data and raw design data. Temporal profiles are visualized by plotting smooth curves (using stat_smooth) for each cluster branch. When plot_dots is set to TRUE, the function overlays raw data points. Additionally, if plot_lm is TRUE, a linear model is fitted to the predicted data, and if the model meets specified thresholds for R-squared (lm_R2) and absolute slope (lm_abs_slope) (i.e., R2 > 0.1 and absolute slope > 0.05), a dashed regression line is added along with an annotation of the R-squared and slope values. The resulting list of ggplot2 objects can be used to visually inspect the temporal dynamics of OTUs across different clusters and groups.

Examples

Run this code
# \donttest{
metadata <- data.frame(
  TimePoint = c(1, 2, 3, 4),
  Sample = c('S1', 'S2', 'S3', 'S4'),
  GroupA = c('A', 'A', 'B', 'B'),
  GroupB = c('X', 'Y', 'X', 'Y')
)

# Example pre-processed data (e.g., transformed abundance data)
Pre_processed_Data <- data.frame(
  Feature1 = rnorm(4),
  Feature2 = rnorm(4)
)

# Create design matrix using grouping variables
design_data <- Design(metadata, Group_var = c('GroupA', 'GroupB'), Pre_processed_Data,
                      Sample_Time = 'TimePoint', Sample_ID = 'Sample')

reg <- Reg.SPLR(design_data,
                  Pre_processed_Data,
                  z_score = 2,
                  unique_values = 5,
                  Knots = NULL,
                  max_Knots = 5)
predictions <- Pred.data(reg,
                        metadata,
                        Group = "GroupA",
                        time_step = 1,
                        Sample_Time = "TimePoint")
result <- Data.cluster(predicted_data = predictions,
                       clust_method = "average",
                       font_size = 0.2,
                       dend_title_size = 15)

result <- Data.cluster.cut(cluster_outputs = result,
                          cut_height = 0.3,
                          cut_height_dist = 0.2,
                          auto_cutree = FALSE)

curves <- Data.visual(cluster_results = result,
                      cutree_by = "height",
                      cluster_height = c(0.2,0.2),
                      cluster_branches = NA,
                      predicted_data = predictions,
                      Design_data = design_data,
                      pre_processed_data = Pre_processed_Data,
                      Taxa = NULL,
                      plot_dots = TRUE)
# }

Run the code above in your browser using DataLab