gsnORAtest: gsnORAtest

Description

Perform an ORA test using an experimentally-derived gene set to query a gene set collection.

Usage

gsnORAtest(l, bg, geneSetCollection, Alpha = 0.05, full = FALSE)

Value

Returns a data.frame with an ORA (overrepresentation analysis) results set containing the following columns:

ID: the gene set identifiers.
Title: The "Title" field from tmod class gene set collection objects, corresponding to the reformatted STANDARD_NAME field in an MSigDB xml file, with spaces substituted for underscores and initial only uppercase. NOTE: If the search is done using a list of gene sets rather than a tmod object, this column will contain NA.
a: the number of genes observed in the background but not in l or the queried gene set. (present only if full == TRUE)
b: the number of observed genes in l but not the queried gene set. (present only if full == TRUE)
c: the number of observed genes in the queried gene set but not l. (present only if full == TRUE)
d: the number of observed genes in both l and the queried gene set, i.e. the overlap. (present only if full == TRUE)
N: the number of observed genes the queried gene set.
Enrichment: The fold overrepresentation of genes in the overlap set d calculated as: $$E = (d / (c+d)) / ((b+d)/(a+b+c+d))$$
P_2S: 2-sided Fisher p-value. (NOT log-transformed, present only if full == TRUE)
adj.P.2S: 2-sided Fisher p-value corrected using the method of Benjamini & Hochberg(1) and implemented in the stats package. (present only if full == TRUE)
P_1S: 1-sided Fisher p-value. (NOT log-transformed.)
adj.P.1S: 1-sided Fisher p-value corrected using the method of Benjamini & Hochberg(1) and implemented in the stats package. (present only if full == TRUE)

Arguments

l: A vector containing an experimentally-derived set of genes. These may be significantly differentially expressed genes, genes with differential chromatin accessibility or positives from a screen.
bg: A vector containing a background of observable genes.
geneSetCollection: A gene set collection to query, either a tmod object or a list of character vectors containing gene sets for which the list element names are the gene set IDs.
Alpha: The alpha value setting the significance cutoff adjusted p-value.
full: This gives additional data in the results set, specifically the contingency table values.

Details

This function is provided to allow rapid and easy overrepresentation analysis using an unordered experimental gene set to query a gene set collection that may be either an arbitrary list of gene-sets, or an tmod class gene set collection. The statistical tests provided include both the standard two-sided Fisher and a 1-sided Fisher test, similar to what is provided by the DAVID pathways analysis web application(2).

If a list of gene sets is provided as the geneSetCollection argument, it must be structured as a list of character vectors containing gene symbols (or whatever identifiers are used for the supplied experimental gene set),

References

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57, 289–300. <http://www.jstor.org/stable/2346101>.
Dennis G Jr, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA. (2003). DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol., 4(5):P3. Epub 2003 Apr 3.

Examples

Run this code


library(GSNA)

# From a differential expression data set, we can generate a
# subset of genes with significant differential expression,
# up or down. Here we will extract genes with significant
# negative differential expression with
# avg_log2FC < 0 and p_val_adj <= 0.05 from **Seurat** data:

sig_DN.genes <-
   toupper( rownames(subset( Bai_CiHep_v_Fib2.de,
                       avg_log2FC < 0  & p_val_adj < 0.05 )))

# Using all the genes in the differential expression data set,
# we can obtain a suitable background:
bg <- toupper(rownames( Bai_CiHep_v_Fib2.de ))

# Now, we can do a overrepresentation analysis search on this
# data using the Bai_gsc.tmod gene set collection included in
# the sample data:
sig_DN.gsnora <- gsnORAtest( l = sig_DN.genes,
                             bg = bg,
                             geneSetCollection = Bai_gsc.tmod )

Run the code above in your browser using DataLab