run_pathfindR: Wrapper Function for pathfindR Workflow

Description

run_pathfindR is the wrapper function for the pathfindR workflow

Usage

run_pathfindR(input, p_val_threshold = 0.05,
  enrichment_threshold = 0.05, adj_method = "bonferroni",
  search_method = "GR", use_all_positives = FALSE, saTemp0 = 1,
  saTemp1 = 0.01, saIter = 10000, gaPop = 400, gaIter = 10000,
  gaThread = 5, gaMut = 0, grMaxDepth = 1, grSearchDepth = 1,
  grOverlap = 0.5, grSubNum = 1000, iterations = 10,
  n_processes = NULL, pin_name_path = "Biogrid", score_thr = 3,
  sig_gene_thr = 2, gene_sets = "KEGG", custom_genes = NULL,
  custom_pathways = NULL, bubble = TRUE,
  output_dir = "pathfindR_Results", list_active_snw_genes = FALSE,
  silent_option = TRUE)

Arguments

input

the input data that pathfindR uses. The input must be a data frame with three columns:

Gene Symbol (HGNC Gene Symbol)
Change value, e.g. log(fold change)
adjusted p value associated with test, e.g. differential expression/methylation

p_val_threshold

the adjusted-p value threshold to use when filtering the input data frame. Must a numeric value between 0 and 1.

enrichment_threshold

threshold used when filtering individual pathway enrichment results

adj_method

correction method to be used for adjusting p-values of pathway enrichment results (Default: 'bonferroni')

search_method

algorithm to use when performing active subnetwork search. Options are greedy search (GR), simulated annealing (SA) or genetic algorithm (GA) for the search (Default:GR. Can be one of c("GR", "SA", "GA"))

use_all_positives

if TRUE: in GA, adds an individual with all positive nodes. In SA, initializes candidate solution with all positive nodes. (Default = FALSE)

saTemp0

initial temperature for SA (Default: 1.0)

saTemp1

final temperature for SA (Default: 0.01)

saIter

iteration number for SA (Default: 10000)

gaPop

population size for GA (Default: 400)

gaIter

iteration number for GA (Default: 10000)

gaThread

number of threads to be used in GA (Default: 5)

gaMut

the mutation rate for GA (Default: 0)

grMaxDepth

sets max depth in greedy search. set to 0 for no limit (Default: 1)

grSearchDepth

sets search depth in greedy search (Default: 1)

grOverlap

sets overlap threshold for results of greedy search (Default: 0.5)

grSubNum

sets number of subnetworks to be presented in the results (Default: 1000)

iterations

number of iterations for active subnetwork search and enrichment analyses (Default = 10. Gets set to 1 for Genetic Algorithm)

n_processes

optional argument for specifying the number of processes used by foreach. If not specified, the function determines this automatically (Default == NULL. Gets set to 1 for Genetic Algorithm)

pin_name_path

Name of the chosen PIN or path/to/PIN.sif. If PIN name, must be one of c("Biogrid", "GeneMania", "IntAct", "KEGG"). If path/to/PIN.sif, the file must comply with the PIN specifications. Defaults to "Biogrid".

score_thr

active subnetwork score threshold (Default = 3)

sig_gene_thr

threshold for minimum number of significant genes (Default = 2)

gene_sets

the gene sets to be used for enrichment analysis. Available gene sets are KEGG, Reactome, BioCarta, GO-BP, GO-CC, GO-MF or Custom. If "Custom", the arguments custom_genes and custom pathways must be specified. (Default = "KEGG")

custom_genes

a list containing the genes involved in each custom pathway. Each element is a vector of gene symbols located in the given pathway. Names correspond to the ID of the pathway.

custom_pathways

A list containing the descriptions for each custom pathway. Names of the list correspond to the ID of the pathway.

bubble

boolean value. If TRUE, a bubble chart displaying the enrichment results is plotted. (default = TRUE)

output_dir

the directory to be created under the current working directory where the output and intermediate files are saved (default: "pathfindR_Results")

list_active_snw_genes

boolean value indicating whether or not to report the non-DEG active subnetwork genes for the active subnetwork which was enriched for the given pathway with the lowest p value (default = FALSE)

silent_option

boolean value indicating whether or not to print to the console (FALSE) or print to a file (TRUE) during active subnetwork search (default = TRUE)

Value

Data frame of pathfindR enrichment results. Columns are:

ID: KEGG ID of the enriched pathway
Pathway: Description of the enriched pathway
Fold_Enrichment: Fold enrichment value for the enriched pathway
occurrence: the number of iterations that the given pathway was found to enriched over all iterations
lowest_p: the lowest adjusted-p value of the given pathway over all iterations
highest_p: the highest adjusted-p value of the given pathway over all iterations
non_DEG_Active_Snw_Genes (OPTIONAL): the non-DEG active subnetwork genes, comma-separated
Up_regulated: the up-regulated genes in the input involved in the given pathway, comma-separated
Down_regulated: the down-regulated genes in the input involved in the given pathway, comma-separated

The function also creates an HTML report with the pathfindR enrichment results linked to the visualizations of the pathways in addition to the table of converted gene symbols. This report can be found in "`output_dir`/results.html" under the current working directory.

Optionally, a bubble chart of enrichment results are plotted. The x-axis corresponds to fold enrichment values while the y-axis indicates the enriched pathways. Size of the bubble indicates the number of DEGs in the given pathway. Color indicates the -log10(lowest-p) value; the more red it gets, the more significant the pathway is.

Warning

Depending on the protein interaction network of your choice, active subnetwork finding component of pathfindR may take a very long time to finish.

Details

This function takes in a data frame consisting of Gene Symbol, log-fold-change and adjusted-p values. After input testing, any gene symbols that are not in the PIN are converted to alias symbols if the alias is in the PIN. Next, active subnetwork search is performed. Pathway enrichment analysis is performed using the genes in each of the active subnetworks. Pathways with adjusted-p values lower than enrichment_threshold are discarded. The lowest adjusted-p value (over all subnetworks) for each pathway is kept. This process of active subnetwork search and enrichment is repeated for a selected number of iterations, which is done in parallel. Over all iterations, the lowest and the highest adjusted-p values, as well as number of occurrences are reported for each enriched pathway.

Examples

Run this code

# NOT RUN {
run_pathfindR(RA_input)
# }

Run the code above in your browser using DataLab