Learn R Programming

EGSEA (version 1.0.3)

egsea: Ensemble of Gene Set Enrichment Analyses Function

Description

This is the main function to carry out gene set enrichment analysis using the EGSEA algorithm. This function is aimed to extend the limma-voom pipeline of RNA-seq analysis.

Usage

egsea(voom.results, contrasts, logFC = NULL, gs.annots, symbolsMap = NULL, baseGSEAs = egsea.base(), minSize = 2, display.top = 20, combineMethod = "fisher", combineWeights = NULL, sort.by = "p.adj", egsea.dir = "./", kegg.dir = NULL, logFC.cutoff = 0, sum.plot.axis = "p.adj", sum.plot.cutoff = NULL, vote.bin.width = 5, num.threads = 4, report = TRUE, print.base = FALSE, verbose = FALSE)

Arguments

voom.results
list, an EList object generated using the voom function. Entrez Gene IDs should be used as row names.
contrasts
double, an N x L matrix indicates the contrast of the linear model coefficients for which the test is required. N is number of experimental conditions and L is number of contrasts.
logFC
double, an K x L matrix indicates the log2 fold change of each gene for each contrast. K is the number of genes included in the analysis. If logFC=NULL, the logFC values are estimated using the ebayes for each contrast.
gs.annots
list, indexed collections of gene sets. It is generated using one of these functions: buildIdxEZID, buildMSigDBIdxEZID, buildKEGGIdxEZID, buildGeneSetDBIdxEZID, and buildCustomIdxEZID.
symbolsMap
dataframe, an K x 2 matrix stores the gene symbol of each Entrez Gene ID. It is used for the heatmap visualization. The order of rows should match that of the voom.results. Default symbolsMap=NULL.
baseGSEAs
character, a vector of the gene set tests that should be included in the ensemble. Type egsea.base to see the supported GSE methods. By default, all supported methods are used.
minSize
integer, the minimum size of a gene set to be included in the analysis. Default minSize= 2.
display.top
integer, the number of top gene sets to be displayed in the EGSEA report. You can always access the list of all tested gene sets using the returned gsa list. Default is 20.
combineMethod
character, determines how to combine p-values from different GSEA method. Type egsea.combine() to see supported methods.
combineWeights
double, a vector determines how different GSEA methods will be weighted. Its values should range between 0 and 1. This option is not supported currently.
sort.by
character, determines how to order the analysis results in the stats table. Type egsea.sort() to see all available options.
egsea.dir
character, directory into which the analysis results are written out.
kegg.dir
character, the directory of KEGG pathway data file (.xml) and image file (.png). Default kegg.dir=paste0(egsea.dir, "/kegg-dir/").
logFC.cutoff
numeric, cut-off threshold of logFC and is used for Sginificance Score and Regulation Direction Calculations. Default logFC.cutoff=0.
sum.plot.axis
character, the x-axis of the summary plot. All the values accepted by the sort.by parameter can be used. Default sum.plot.axis="p.value".
sum.plot.cutoff
numeric, cut-off threshold to filter the gene sets of the summary plots based on the values of the sum.plot.axis. Default sum.plot.cutoff=NULL.
vote.bin.width
numeric, the bin width of the vote ranking. Default vote.bin.width=5.
num.threads
numeric, number of CPU threads to be used. Default num.threads=2.
report
logical, whether to generate the EGSEA interactive report. It takes longer time to run. Default is True.
print.base
logical, whether to write out the results of the individual GSE methods. Default FALSE.
verbose
logical, whether to print out progress messages and warnings.

Value

A list of elements, each with two/three elements that store the top gene sets and the detailed analysis results for each contrast and the comparative analysis results.

Details

EGSEA, an acronym for Ensemble of Gene Set Enrichment Analyses, utilizes the analysis results of eleven prominent GSE algorithms from the literature to calculate collective significance scores for gene sets. These methods include: ora, globaltest, plage, safe, zscore, gage, ssgsea, roast, padog, camera and gsva. The ora, gage, camera and gsva methods depend on a competitive null hypothesis while the remaining seven methods are based on a self-contained hypothesis. Conveniently, the algorithm proposed here is not limited to these eleven GSE methods and new GSE tests can be easily integrated into the framework. This function takes the voom object and the contrast matrix as parameters.

References

Monther Alhamdoosh, Milica Ng, Nicholas J. Wilson, Julie M. Sheridan, Huy Huynh, Michael J. Wilson and Matthew E. Ritchie. Combining multiple tools outperforms individual methods in gene set enrichment analyses.

See Also

egsea.base, egsea.sort, buildIdxEZID, buildMSigDBIdxEZID, buildKEGGIdxEZID, buildGeneSetDBIdxEZID, and buildCustomIdxEZID

Examples

Run this code
library(EGSEAdata)
data(il13.data)
v = il13.data$voom
contrasts = il13.data$contra
gs.annots = buildIdxEZID(entrezIDs=rownames(v$E), species="human", 
msigdb.gsets="none", 
         kegg.updated=FALSE, kegg.exclude = c("Metabolism"))
# set report = TRUE to generate the EGSEA interactive report
gsa = egsea(voom.results=v, contrasts=contrasts,  gs.annots=gs.annots, 
         symbolsMap=v$genes, 
baseGSEAs=egsea.base()[-c(2,5,6,9)], display.top = 5,
          sort.by="avg.rank", egsea.dir="./il13-egsea-report", 
         num.threads = 2, report = FALSE)
 

Run the code above in your browser using DataLab