tmodUtest: Perform a statistical test of module expression

Description

Perform a statistical test of module expression

Usage

tmodUtest(l, modules = NULL, qval = 0.05, order.by = "pval",
  filter = FALSE, mset = "LI", cols = "Title", useR = FALSE)
tmodCERNOtest(l, modules = NULL, qval = 0.05, order.by = "pval",
  filter = FALSE, mset = "LI", cols = "Title", useR = FALSE)
tmodHGtest(fg, bg, modules = NULL, qval = 0.05, order.by = "pval",
  filter = FALSE, mset = "LI", cols = "Title")

Arguments

sorted list of HGNC gene identifiers

modules

optional list of modules for which to make the test

qval

Threshold FDR value to report

order.by

Order by P value ("pval") or none ("none")

filter

Remove gene names which have no module assignments

mset

Which module set to use. Either a character vector ("LI", "DC" or "all", default: LI) or a list (see "Custom module definitions" below)

cols

Which columns from the MODULES data frame should be included in resulsts

useR

use the R wilcox.test function; slow, but with exact p-values for small samples

foreground gene set for the HG test

background gene set for the HG test

Value

A data frame with module names, additional statistic (e.g. enrichment or AUC, depending on the test), P value and FDR q-value (P value corrected for multiple testing using the p.adjust function and Benjamini-Hochberg correction.

Custom module definitions

Custom and arbitrary module, gene set or pathway definitions can be also provided through the mset option, if the parameter is a list rather than a character vector. The list parameter to mset must contain the following members: "MODULES", "MODULES2GENES" and "GENES".

"MODULES" and "GENES" are data frames. It is required that MODULES contains the following columns: "ID", specifying a unique identifier of a module, and "Title", containing the description of the module. The data frame "GENES" must contain the column "ID".

The list MODULES2GENES is a mapping between modules and genes. The names of the list must correspond to the ID column of the MODULES data frame. The members of the list are character vectors, and the values of these vectors must correspond to the ID column of the GENES data frame.

Details

Performs a test on either on an ordered list of genes (tmodUtest, tmodCERNOtest) or on two groups of genes (tmodHGtest). tmodUtest is a U test on ranks of genes that are contained in a module.

tmodCERNOtest is also a nonparametric test working on gene ranks, but it originates from Fisher's combined probability test. This test weights genes with lower ranks more, the resulting p-values better correspond to the observed effect size. In effect, modules with small effect but many genes get higher p-values than in case of the U-test.

tmodHGtest is simply a hypergeometric test.

In tmod, two module sets can be used, "LI" (from Li et al. 2013), or "DC" (from Chaussabel et al. 2008). Using the parameter "mset", the module set can be selected, or, if mset is "all", both of sets are used.

Examples

Run this code

data(tmod)
fg <- tmod$MODULES2GENES[["LI.M127"]]
bg <- tmod$GENES$ID
result <- tmodHGtest( fg, bg )

## A more sophisticated example
## Gene set enrichment in TB patients compared to
## healthy controls (Egambia data set)

library(limma)
data(Egambia)
design <- cbind(Intercept=rep(1, 30), TB=rep(c(0,1), each= 15))
fit <- eBayes( lmFit(Egambia[,-c(1:3)], design))
tt <- topTable(fit, coef=2, number=Inf, genelist=Egambia[,1:3] )
tmodUtest(tt$GENE_SYMBOL)

Run the code above in your browser using DataLab