gost: Gene list functional enrichment.

Description

Interface to the g:Profiler tool g:GOSt for functional enrichments analysis of gene lists. In case the input 'query' is a list of gene vectors, results for multiple queries will be returned in the same data frame with column 'query' indicating the corresponding query name. If 'multi_query' is selected, the result is a data frame for comparing multiple input lists, just as in the web tool.

Usage

gost(query, organism = "hsapiens", ordered_query = FALSE,
  multi_query = FALSE, significant = TRUE, exclude_iea = TRUE,
  measure_underrepresentation = FALSE, evcodes = FALSE,
  user_threshold = 0.05, correction_method = c("g_SCS", "bonferroni",
  "fdr", "false_discovery_rate", "gSCS", "analytical"),
  domain_scope = c("annotated", "known", "custom"), custom_bg = NULL,
  numeric_ns = "", sources = NULL)

Arguments

query

vector, or a (named) list of vectors for multiple queries, that can consist of mixed types of gene IDs (proteins, transcripts, microarray IDs, etc), SNP IDs, chromosomal intervals or term IDs.

organism

organism name. Organism names are constructed by concatenating the first letter of the name and the family name. Example: human - 'hsapiens', mouse - 'mmusculus'.

ordered_query

in case input gene lists are ranked this option may be used to get GSEA style p-values.

multi_query

in case of multiple gene lists, returns comparison table of these lists. If enabled, the result data frame has columns named 'p_values', 'query_sizes', 'intersection_sizes' with vectors showing values in the order of input queries. To get the results in a long format set 'multi_query' to FALSE and just input query list of multiple gene vectors.

significant

whether all or only statistically significant results should be returned.

exclude_iea

exclude GO electronic annotations (IEA).

measure_underrepresentation

measure underrepresentation.

evcodes

include evidence codes to the results. Note that this can decrease performance and make the query slower. In addition, a column 'intersection' is created that contains the gene id-s that intersect between the query and term. This parameter does not work if 'multi_query' is set to TRUE.

user_threshold

custom p-value threshold, results with a larger p-value are excluded.

correction_method

the algorithm used for multiple testing correction, one of "gSCS" (synonyms: "analytical", "g_SCS"), "fdr" (synonyms: "false_discovery_rate"), "bonferroni".

domain_scope

how to define statistical domain, one of "annotated", "known" or "custom".

custom_bg

vector of gene names to use as a statistical background. If given, the domain_scope is set to 'custom'.

numeric_ns

namespace to use for fully numeric IDs.

sources

a vector of data sources to use. Currently, these include GO (GO:BP, GO:MF, GO:CC to select a particular GO branch), KEGG, REAC, TF, MIRNA, CORUM, HP, HPA, WP. Please see the g:GOSt web tool for the comprehensive list and details on incorporated data sources.

Value

A named list where 'result' contains data.frame with the enrichment analysis results and 'meta' contains metadata needed for Manhattan plot. If the input consisted of several lists the corresponding list is indicated with a variable 'query'. When requesting a 'multi_query', either TRUE or FALSE, the columns of the resulting data frame differ. If 'evcodes' is set, the return value includes columns 'evidence_codes' and 'intersection'. The latter conveys info about the intersecting genes between the corresponding query and term.

Examples

Run this code

# NOT RUN {
gostres <- gost(c("X:1000:1000000", "rs17396340", "GO:0005005", "ENSG00000156103", "NLRP1"))

# }

Run the code above in your browser using DataLab