pathway_pca: Perform Principal Component Analysis (PCA) on functional pathway abundance data

Description

This function performs PCA analysis on pathway abundance data and creates an informative visualization that includes a scatter plot of the first two principal components (PC1 vs PC2) with density plots for both PCs. The plot helps to visualize the clustering patterns and distribution of samples across different groups.

Usage

pathway_pca(abundance, metadata, group, colors = NULL, show_marginal = TRUE)

Value

A ggplot object showing:

Center: PCA scatter plot with confidence ellipses (95
Top: Density plot for PC1
Right: Density plot for PC2

Arguments

abundance: A numeric matrix or data frame containing pathway abundance data. Rows represent pathways, columns represent samples. Column names must match the sample names in metadata. Values must be numeric and cannot contain missing values (NA).
metadata: A data frame containing sample information. Must include a column for grouping samples (specified by the 'group' parameter). Sample identifiers are auto-detected from columns named sample_name, Sample_ID, SampleID, etc., or from rownames.
group: A character string specifying the column name in metadata that contains group information for samples (e.g., "treatment", "condition", "group").
colors: Optional. A character vector of colors for different groups. Length must match the number of unique groups. If NULL, default colors will be used.
show_marginal: Logical. Whether to show marginal density plots for PC1 and PC2. Default is TRUE. Set to FALSE to show only the PCA scatter plot.

Details

The function automatically aligns samples between abundance data and metadata, supporting various sample identifier formats. Samples and pathways with zero variance are filtered before PCA.

Examples

Run this code

# Create example abundance data
abundance_data <- matrix(rnorm(30), nrow = 3, ncol = 10)
colnames(abundance_data) <- paste0("Sample", 1:10)
rownames(abundance_data) <- c("PathwayA", "PathwayB", "PathwayC")

# Create example metadata
metadata <- data.frame(
  sample_name = paste0("Sample", 1:10),
  group = factor(rep(c("Control", "Treatment"), each = 5))
)

# Basic PCA plot with default colors
pca_plot <- pathway_pca(abundance_data, metadata, "group")

# PCA plot with custom colors
pca_plot <- pathway_pca(
  abundance_data,
  metadata,
  "group",
  colors = c("blue", "red")  # One color per group
)

# PCA plot without marginal density plots
pca_plot <- pathway_pca(
  abundance_data,
  metadata,
  "group",
  show_marginal = FALSE
)

# \donttest{
# Example with real data
data("metacyc_abundance")  # Load example pathway abundance data
data("metadata")          # Load example metadata

# Generate PCA plot
# Prepare abundance data
abundance_data <- as.data.frame(metacyc_abundance)
rownames(abundance_data) <- abundance_data$pathway
abundance_data <- abundance_data[, -which(names(abundance_data) == "pathway")]

# Create PCA plot
pathway_pca(
  abundance_data,
  metadata,
  "Environment",
  colors = c("green", "purple")
)
# }

Run the code above in your browser using DataLab