egsea.cnt: Ensemble of Gene Set Enrichment Analyses Function

Description

This is the main function to carry out gene set enrichment analysis using the EGSEA algorithm. This function is aimed to use the raw count matrix to perform the EGSEA analysis.

Usage

egsea.cnt(counts, group, design = NULL, contrasts, logFC = NULL, gs.annots, symbolsMap = NULL, baseGSEAs = egsea.base(), minSize = 2, display.top = 20, combineMethod = "fisher", combineWeights = NULL, sort.by = "p.adj", egsea.dir = "./", kegg.dir = NULL, logFC.cutoff = 0, sum.plot.axis = "p.adj", sum.plot.cutoff = NULL, vote.bin.width = 5, num.threads = 4, report = TRUE, print.base = FALSE, verbose = FALSE)

Arguments

counts

double, numeric matrix of read counts where genes are the rows and samples are the columns.

group

character, vector or factor giving the experimental group/condition for each sample/library

design

double, numeric matrix giving the design matrix of the linear model fitting.

contrasts

double, an N x L matrix indicates the contrast of the linear model coefficients for which the test is required. N is number of experimental conditions and L is number of contrasts.

logFC

double, an K x L matrix indicates the log2 fold change of each gene for each contrast. K is the number of genes included in the analysis. If logFC=NULL, the logFC values are estimated using the eBayes for each contrast.

gs.annots

list, indexed collections of gene sets. It is generated using one of these functions: buildIdxEZID, buildMSigDBIdxEZID, buildKEGGIdxEZID, buildGeneSetDBIdxEZID, and buildCustomIdxEZID.

symbolsMap

dataframe, an K x 2 matrix stores the gene symbol of each Entrez Gene ID. It is used for the heatmap visualization. The order of rows should match that of the counts. Default symbolsMap=NULL.

baseGSEAs

character, a vector of the gene set tests that should be included in the ensemble. Type egsea.base to see the supported GSE methods. By default, all supported methods are used.

minSize

integer, the minimum size of a gene set to be included in the analysis. Default minSize= 2.

display.top

integer, the number of top gene sets to be displayed in the EGSEA report. You can always access the list of all tested gene sets using the returned gsa list. Default is 20.

combineMethod

character, determines how to combine p-values from different GSEA method. Type egsea.combine() to see supported methods.

combineWeights

double, a vector determines how different GSEA methods will be weighted. Its values should range between 0 and 1. This option is not supported currently.

sort.by

character, determines how to order the analysis results in the stats table. Type egsea.sort() to see all available options.

egsea.dir

character, directory into which the analysis results are written out.

kegg.dir

character, the directory of KEGG pathway data file (.xml) and image file (.png). Default kegg.dir=paste0(egsea.dir, "/kegg-dir/").

logFC.cutoff

numeric, cut-off threshold of logFC and is used for Sginificance Score and Regulation Direction Calculations. Default logFC.cutoff=0.

sum.plot.axis

character, the x-axis of the summary plot. All the values accepted by the sort.by parameter can be used. Default sum.plot.axis="p.value".

sum.plot.cutoff

numeric, cut-off threshold to filter the gene sets of the summary plots based on the values of the sum.plot.axis. Default sum.plot.cutoff=NULL.

vote.bin.width

numeric, the bin width of the vote ranking. Default vote.bin.width=5.

num.threads

numeric, number of CPU threads to be used. Default num.threads=2.

report

logical, whether to generate the EGSEA interactive report. It takes longer time to run. Default is True.

print.base

logical, whether to write out the results of the individual GSE methods. Default FALSE.

verbose

logical, whether to print out progress messages and warnings.

Value

A list of elements, each with two/three elements that store the top gene sets and the detailed analysis results for each contrast and the comparative analysis results.

Details

EGSEA, an acronym for Ensemble of Gene Set Enrichment Analyses, utilizes the analysis results of eleven prominent GSE algorithms from the literature to calculate collective significance scores for gene sets. These methods include: ora, globaltest, plage, safe, zscore, gage, ssgsea, roast, padog, camera and gsva. The ora, gage, camera and gsva methods depend on a competitive null hypothesis while the remaining seven methods are based on a self-contained hypothesis. Conveniently, the algorithm proposed here is not limited to these eleven GSE methods and new GSE tests can be easily integrated into the framework. This function takes the raw count matrix, the experimental group of each sample, the design matrix and the contrast matrix as parameters. It performs TMM normalization and then applies voom to calculate the logCPM and weighting factors.

References

Monther Alhamdoosh, Milica Ng, Nicholas J. Wilson, Julie M. Sheridan, Huy Huynh, Michael J. Wilson and Matthew E. Ritchie. Combining multiple tools outperforms individual methods in gene set enrichment analyses.

Examples

Run this code

library(EGSEAdata)
data(il13.data.cnt)
cnt = il13.data.cnt$counts
group = il13.data.cnt$group
design = il13.data.cnt$design
contrasts = il13.data.cnt$contra
genes = il13.data.cnt$genes
gs.annots = buildIdxEZID(entrezIDs=rownames(cnt), species="human", 
msigdb.gsets="none",
         kegg.updated=FALSE, kegg.exclude = c("Metabolism"))
# set report = TRUE to generate the EGSEA interactive report
gsa = egsea.cnt(counts=cnt, group=group, design=design, contrasts=contrasts, 
         gs.annots=gs.annots, 
         symbolsMap=genes, baseGSEAs=egsea.base()[-c(2,5,6,9)], 
display.top = 5,
          sort.by="avg.rank", 
egsea.dir="./il13-egsea-cnt-report", 
         num.threads = 2, report = FALSE)

Run the code above in your browser using DataLab