Learn R Programming

NOVA (version 0.1.1)

pca_analysis_enhanced: Enhanced PCA Analysis for MEA Data

Description

This function performs Principal Component Analysis (PCA) on MEA data with extensive flexibility for data input sources, parameter configuration, and output options. It handles missing values, applies variance filtering, creates visualization plots, and provides comprehensive results suitable for downstream analysis.

Usage

pca_analysis_enhanced(
  normalized_data = NULL,
  data_path = NULL,
  config = NULL,
  processing_result = NULL,
  min_var = NULL,
  impute = NULL,
  scale_data = NULL,
  n_components = NULL,
  variance_cutoff = NULL,
  grouping_variables = NULL,
  sample_id_components = NULL,
  value_column = "Normalized_Value",
  variable_column = "Variable",
  timepoint_column = "Timepoint",
  output_path = NULL,
  verbose = TRUE
)

Value

A list containing: - pca_result: Complete prcomp() object with PCA results - plot_data: Data frame ready for plotting with PC scores and metadata - variance_explained: Vector of variance explained by each component - cumulative_variance: Vector of cumulative variance explained - elbow_plot: ggplot2 object showing variance explained by components - elbow_data: Data frame underlying the elbow plot - components_needed: Number of components needed for various variance thresholds - count_summary: Summary of sample counts by groups (if applicable) - data_info: Information about data processing steps - config_used: Configuration parameters actually used - processing_source: Source of input data ("processing_result", "excel_file", or "direct_data")

Arguments

normalized_data

Data.frame. Pre-loaded MEA data in long format (default: NULL)

data_path

Character. Path to Excel file containing MEA data (default: NULL)

config

List. Configuration object with analysis parameters (default: NULL)

processing_result

List. Output from process_mea_flexible function (default: NULL)

min_var

Numeric. Minimum variance threshold for variable inclusion (default: 0.01)

impute

Logical. Whether to impute missing values (default: TRUE)

scale_data

Logical. Whether to scale variables before PCA (default: TRUE)

n_components

Integer. Number of principal components to extract (default: 2)

variance_cutoff

Numeric. Cumulative variance percentage threshold (default: 70)

grouping_variables

Character vector. Variables for sample grouping (default: c("Treatment", "Genotype"))

sample_id_components

Character vector. Variables to create unique sample IDs (default: c("Well", "Timepoint", "Treatment", "Genotype"))

value_column

Character. Name of column containing values for PCA (default: "Normalized_Value")

variable_column

Character. Name of column containing variable names (default: "Variable")

timepoint_column

Character. Name of column containing timepoint information (default: "Timepoint")

output_path

Character. Optional path to save elbow plot (default: NULL, no file saved)

verbose

Logical. Whether to print detailed progress messages (default: TRUE)

Details

The function provides three flexible data input methods: 1. **processing_result**: Direct output from process_mea_flexible function 2. **data_path**: Path to Excel file with normalized_data sheet 3. **normalized_data**: Pre-loaded data frame in long format

Data processing includes: - Automatic detection of available columns - Flexible sample ID creation from specified components - Missing value imputation (mean, median, or zero) - Variance-based variable filtering - Automatic scaling option - Creation of elbow plot for component selection

The function handles common MEA data challenges: - Missing timepoint or treatment information - Inconsistent column naming - Mixed data types and missing values - Variable numbers of experiments and conditions

Method 1: Use output from MEA processing function process_mea_flexible("/path/to/data", baseline_timepoint = "baseline") pca_analysis_enhanced(processing_result = mea_result)

Method 2: Load from saved Excel file pca_analysis_enhanced(data_path = "/path/to/processed_data.xlsx")

Method 3: Use pre-loaded data with custom parameters normalized_data = my_data