- abundance
A data frame containing gene/enzyme abundance data, with features as rows and samples as columns.
For KEGG analysis: features should be KO IDs (e.g., K00001).
For MetaCyc analysis: features should be EC numbers (e.g., EC:1.1.1.1 or 1.1.1.1), NOT pathway IDs.
For GO analysis: features should be KO IDs that will be mapped to GO terms.
NOTE: This function requires gene-level data, not pathway-level abundances.
For pathway abundance analysis, use pathway_daa instead
- metadata
A data frame containing sample metadata
- group
A character string specifying the column name in metadata that contains the grouping variable
- pathway_type
A character string specifying the pathway type: "KEGG", "MetaCyc", or "GO"
- method
A character string specifying the GSEA method:
"camera": Competitive gene set test using limma's camera function (recommended).
Accounts for inter-gene correlations and provides more reliable p-values.
"fry": Fast approximation to rotation gene set testing using limma's fry function.
Self-contained test that is computationally efficient.
"fgsea": Fast preranked GSEA implementation. Note: preranked methods may produce
unreliable p-values due to not accounting for inter-gene correlations (Wu et al., 2012).
"GSEA" or "clusterProfiler": clusterProfiler's GSEA implementation.
- covariates
A character vector specifying column names in metadata to use as covariates
for adjustment. Only used when method is "camera" or "fry". Default is NULL (no covariates).
Example: covariates = c("age", "sex", "BMI")
- contrast
For multi-group comparisons with "camera" or "fry" methods, specify the contrast
to test. Can be a character string naming a group level, or a numeric vector of contrast weights.
Default is NULL (automatic: compares second group to first).
- inter.gene.cor
Numeric value specifying the inter-gene correlation for camera method.
Default is 0.01. Use NA to estimate correlation from data for each gene set.
- rank_method
A character string specifying the ranking statistic for preranked methods
(fgsea, GSEA, clusterProfiler): "signal2noise", "t_test", "log2_ratio", or "diff_abundance"
- nperm
An integer specifying the number of permutations (for clusterProfiler method only).
The fgsea method uses adaptive multilevel splitting and does not require a fixed permutation count.
- min_size
An integer specifying the minimum gene set size
- max_size
An integer specifying the maximum gene set size
- p_adjust_method
A character string specifying the p-value adjustment method
- seed
An integer specifying the random seed for reproducibility
- go_category
A character string specifying GO category to use.
"all" (default) uses all categories present in the reference data.
Valid categories are determined by the reference data (currently MF and CC).
See table(ko_to_go_reference$category) for available categories.
- organism
A character string specifying the organism for KEGG analysis (default: "ko" for KEGG Orthology)
- p.adjust
Deprecated. Use p_adjust_method instead.