Learn R Programming

EGSEA (version 1.0.3)

egsea.cnt: Ensemble of Gene Set Enrichment Analyses Function

Description

This is the main function to carry out gene set enrichment analysis using the EGSEA algorithm. This function is aimed to use the raw count matrix to perform the EGSEA analysis.

Usage

egsea.cnt(counts, group, design = NULL, contrasts, logFC = NULL, gs.annots, symbolsMap = NULL, baseGSEAs = egsea.base(), minSize = 2, display.top = 20, combineMethod = "fisher", combineWeights = NULL, sort.by = "p.adj", egsea.dir = "./", kegg.dir = NULL, logFC.cutoff = 0, sum.plot.axis = "p.adj", sum.plot.cutoff = NULL, vote.bin.width = 5, num.threads = 4, report = TRUE, print.base = FALSE, verbose = FALSE)

Arguments

counts
double, numeric matrix of read counts where genes are the rows and samples are the columns.
group
character, vector or factor giving the experimental group/condition for each sample/library
design
double, numeric matrix giving the design matrix of the linear model fitting.
contrasts
double, an N x L matrix indicates the contrast of the linear model coefficients for which the test is required. N is number of experimental conditions and L is number of contrasts.
logFC
double, an K x L matrix indicates the log2 fold change of each gene for each contrast. K is the number of genes included in the analysis. If logFC=NULL, the logFC values are estimated using the eBayes for each contrast.
gs.annots
list, indexed collections of gene sets. It is generated using one of these functions: buildIdxEZID, buildMSigDBIdxEZID, buildKEGGIdxEZID, buildGeneSetDBIdxEZID, and buildCustomIdxEZID.
symbolsMap
dataframe, an K x 2 matrix stores the gene symbol of each Entrez Gene ID. It is used for the heatmap visualization. The order of rows should match that of the counts. Default symbolsMap=NULL.
baseGSEAs
character, a vector of the gene set tests that should be included in the ensemble. Type egsea.base to see the supported GSE methods. By default, all supported methods are used.
minSize
integer, the minimum size of a gene set to be included in the analysis. Default minSize= 2.
display.top
integer, the number of top gene sets to be displayed in the EGSEA report. You can always access the list of all tested gene sets using the returned gsa list. Default is 20.
combineMethod
character, determines how to combine p-values from different GSEA method. Type egsea.combine() to see supported methods.
combineWeights
double, a vector determines how different GSEA methods will be weighted. Its values should range between 0 and 1. This option is not supported currently.
sort.by
character, determines how to order the analysis results in the stats table. Type egsea.sort() to see all available options.
egsea.dir
character, directory into which the analysis results are written out.
kegg.dir
character, the directory of KEGG pathway data file (.xml) and image file (.png). Default kegg.dir=paste0(egsea.dir, "/kegg-dir/").
logFC.cutoff
numeric, cut-off threshold of logFC and is used for Sginificance Score and Regulation Direction Calculations. Default logFC.cutoff=0.
sum.plot.axis
character, the x-axis of the summary plot. All the values accepted by the sort.by parameter can be used. Default sum.plot.axis="p.value".
sum.plot.cutoff
numeric, cut-off threshold to filter the gene sets of the summary plots based on the values of the sum.plot.axis. Default sum.plot.cutoff=NULL.
vote.bin.width
numeric, the bin width of the vote ranking. Default vote.bin.width=5.
num.threads
numeric, number of CPU threads to be used. Default num.threads=2.
report
logical, whether to generate the EGSEA interactive report. It takes longer time to run. Default is True.
print.base
logical, whether to write out the results of the individual GSE methods. Default FALSE.
verbose
logical, whether to print out progress messages and warnings.

Value

A list of elements, each with two/three elements that store the top gene sets and the detailed analysis results for each contrast and the comparative analysis results.

Details

EGSEA, an acronym for Ensemble of Gene Set Enrichment Analyses, utilizes the analysis results of eleven prominent GSE algorithms from the literature to calculate collective significance scores for gene sets. These methods include: ora, globaltest, plage, safe, zscore, gage, ssgsea, roast, padog, camera and gsva. The ora, gage, camera and gsva methods depend on a competitive null hypothesis while the remaining seven methods are based on a self-contained hypothesis. Conveniently, the algorithm proposed here is not limited to these eleven GSE methods and new GSE tests can be easily integrated into the framework. This function takes the raw count matrix, the experimental group of each sample, the design matrix and the contrast matrix as parameters. It performs TMM normalization and then applies voom to calculate the logCPM and weighting factors.

References

Monther Alhamdoosh, Milica Ng, Nicholas J. Wilson, Julie M. Sheridan, Huy Huynh, Michael J. Wilson and Matthew E. Ritchie. Combining multiple tools outperforms individual methods in gene set enrichment analyses.

See Also

egsea.base, egsea.sort, buildIdxEZID, buildMSigDBIdxEZID, buildKEGGIdxEZID, buildGeneSetDBIdxEZID, and buildCustomIdxEZID

Examples

Run this code
library(EGSEAdata)
data(il13.data.cnt)
cnt = il13.data.cnt$counts
group = il13.data.cnt$group
design = il13.data.cnt$design
contrasts = il13.data.cnt$contra
genes = il13.data.cnt$genes
gs.annots = buildIdxEZID(entrezIDs=rownames(cnt), species="human", 
msigdb.gsets="none",
         kegg.updated=FALSE, kegg.exclude = c("Metabolism"))
# set report = TRUE to generate the EGSEA interactive report
gsa = egsea.cnt(counts=cnt, group=group, design=design, contrasts=contrasts, 
         gs.annots=gs.annots, 
         symbolsMap=genes, baseGSEAs=egsea.base()[-c(2,5,6,9)], 
display.top = 5,
          sort.by="avg.rank", 
egsea.dir="./il13-egsea-cnt-report", 
         num.threads = 2, report = FALSE)
 

Run the code above in your browser using DataLab