pca_analysis_enhanced: Enhanced PCA Analysis for MEA Data

Description

This function performs Principal Component Analysis (PCA) on MEA data with extensive flexibility for data input sources, parameter configuration, and output options. It handles missing values, applies variance filtering, creates visualization plots, and provides comprehensive results suitable for downstream analysis.

Usage

pca_analysis_enhanced(
  normalized_data = NULL,
  data_path = NULL,
  config = NULL,
  processing_result = NULL,
  min_var = NULL,
  impute = NULL,
  scale_data = NULL,
  n_components = NULL,
  variance_cutoff = NULL,
  grouping_variables = NULL,
  sample_id_components = NULL,
  value_column = "Normalized_Value",
  variable_column = "Variable",
  timepoint_column = "Timepoint",
  output_path = NULL,
  verbose = TRUE
)

Value

A list containing: - pca_result: Complete prcomp() object with PCA results - plot_data: Data frame ready for plotting with PC scores and metadata - variance_explained: Vector of variance explained by each component - cumulative_variance: Vector of cumulative variance explained - elbow_plot: ggplot2 object showing variance explained by components - elbow_data: Data frame underlying the elbow plot - components_needed: Number of components needed for various variance thresholds - count_summary: Summary of sample counts by groups (if applicable) - data_info: Information about data processing steps - config_used: Configuration parameters actually used - processing_source: Source of input data ("processing_result", "excel_file", or "direct_data")

Arguments

normalized_data: Data.frame. Pre-loaded MEA data in long format (default: NULL)
data_path: Character. Path to Excel file containing MEA data (default: NULL)
config: List. Configuration object with analysis parameters (default: NULL)
processing_result: List. Output from process_mea_flexible function (default: NULL)
min_var: Numeric. Minimum variance threshold for variable inclusion (default: 0.01)
impute: Logical. Whether to impute missing values (default: TRUE)
scale_data: Logical. Whether to scale variables before PCA (default: TRUE)
n_components: Integer. Number of principal components to extract (default: 2)
variance_cutoff: Numeric. Cumulative variance percentage threshold (default: 70)
grouping_variables: Character vector. Variables for sample grouping (default: c("Treatment", "Genotype"))
sample_id_components: Character vector. Variables to create unique sample IDs (default: c("Well", "Timepoint", "Treatment", "Genotype"))
value_column: Character. Name of column containing values for PCA (default: "Normalized_Value")
variable_column: Character. Name of column containing variable names (default: "Variable")
timepoint_column: Character. Name of column containing timepoint information (default: "Timepoint")
output_path: Character. Optional path to save elbow plot (default: NULL, no file saved)
verbose: Logical. Whether to print detailed progress messages (default: TRUE)

Details

The function provides three flexible data input methods: 1. **processing_result**: Direct output from process_mea_flexible function 2. **data_path**: Path to Excel file with normalized_data sheet 3. **normalized_data**: Pre-loaded data frame in long format

Data processing includes: - Automatic detection of available columns - Flexible sample ID creation from specified components - Missing value imputation (mean, median, or zero) - Variance-based variable filtering - Automatic scaling option - Creation of elbow plot for component selection

The function handles common MEA data challenges: - Missing timepoint or treatment information - Inconsistent column naming - Mixed data types and missing values - Variable numbers of experiments and conditions

Method 1: Use output from MEA processing function process_mea_flexible("/path/to/data", baseline_timepoint = "baseline") pca_analysis_enhanced(processing_result = mea_result)

Method 2: Load from saved Excel file pca_analysis_enhanced(data_path = "/path/to/processed_data.xlsx")

Method 3: Use pre-loaded data with custom parameters normalized_data = my_data