WebGestaltR function can perform two popular enrichment analyses: ORA (Over-Representation Analysis) and GSEA (Gene Set Enrichment Analysis). Based on the user uploaded gene list or gene list with scores (for GSEA method), WebGestaltR function will first map the gene list to the entrez gene ids and then summary the gene list based on the GO (Gene Ontology) Slim. After performing the enrichment analysis, WebGestaltR function also returns an user-friendly HTML report containing the ID mapping table, GO Slim summary result and the enrichment analysis result. If the functional categories have the DAG (directed acyclic graph) structure, the structure of the enriched categories can also be visualized in the report.
WebGestaltR(enrichMethod="ORA", organism="hsapiens",
enrichDatabase="geneontology_Biological_Process",enrichDatabaseFile=NULL,
enrichDatabaseType=NULL,enrichDatabaseDescriptionFile=NULL,interestGeneFile=NULL,
interestGene=NULL,interestGeneType=NULL,collapseMethod="mean",referenceGeneFile=NULL,
referenceGene=NULL,referenceGeneType=NULL,referenceSet=NULL, minNum=10, maxNum=500,
fdrMethod="BH",sigMethod="fdr",fdrThr=0.05,topThr=10,dNum=20,perNum=1000,
lNum=20,is.output=TRUE,outputDirectory=getwd(),projectName=NULL,keepGSEAFolder=FALSE,
methodType="R",dagColor="binary",hostName="http://www.webgestalt.org/")
WebGestaltR supports two enrichment analysis methods: ORA (Over-Representation Analysis) and GSEA (Gene Set Enrichment Analysis).
Currently, WebGestaltR supports 12 organisms.
Users can use the function listOrganism
to check the available organisms.
Users can also input others
to perform the enrichment analysis based on other
organisms not supported by WebGestaltR. For the other organisms, users need to upload
the enrichment categories, interesting list and reference list (for ORA method).
Because WebGestaltR does not perform the ID mapping for the other organisms,
the above uploaded data should have the same ID type.
The functional categories for the enrichment analysis.
Users can use the function listGeneset
to check the available functional
databases for the selected organism. Users can also input others
to
upload the functional database not supported by WebGestaltR for the selected organism.
If users set organism
as others
or set enrichDatabase
as others
, users need to upload a GMT file as the functional categories
for the enrichment analysis. The extension of the file should be gmt
and
the first column of the file is the category ID, the second one is the external link
for the category. Genes annotated to the category are from the third column.
All columns are separated by tab.
If users set enrichDatabase
as others
, WebGestaltR will also perform
ID mapping for the uploaded GMT file. Thus, users need to set the ID type of the
genes in the enrichDatabaseFile
. If users set organism
as others
,
users do not need to set this ID type because WebGestaltR will not perform ID mapping
for other organisms. The supported ID type of the WebGestaltR for the selected organism
can be found by the function listIDType
.
Users can also upload a description file for the uploaded enrichDatabaseFile
.
The extension of the description file should be des
. The description file
contains two columns: the first column is the category ID that should be exactly
the same as the category ID in the uploaded enrichDatabaseFile
and the second
column is the description of the category. All columns are separated by tab.
If enrichMethod
is ORA
, the extension of the interestGeneFile
should be txt
and the file can only contain one column: the interesting gene
list. If enrichMethod
is GSEA
, the extension of the
interestGeneFile
should be rnk
and the file should contain
two columns separated by tab: the gene list and the corresponding scores.
Users can also use the R object as the input. If enrichMethod
is ORA
,
interestGene
should be an R vector
object containing the interesting
gene list. If enrichMethod
is GSEA
, interestGene
should be an R
data.frame
object containing two columns: the gene list and the corresponding
scores.
The ID type of the interesting gene list. The supported ID type of the WebGestaltR
for the selected organism can be found by the function listIDType
. If the
organism
is others
, users do not need to set this parameter.
The method to collapse the duplicate ids for the GSEA method. mean
,
median
, min
and max
represent the mean, median, minimum
and maximum of scores for the duplicate ids.
For ORA method, the users need to upload the reference gene list. The extension
of the referenceGeneFile
should be txt
and the file can only
contain one column: the reference gene list.
For ORA method, users can also use the R object as the reference gene list.
referenceGene
should be an R vector
object containing the
reference gene list.
The ID type of the reference gene list. The supported ID type of the
WebGestaltR for the selected organism can be found by the function listIDType
.
If the organism
is others
, users do not need to set this parameter.
Users can directly select the reference set from the existing platform in
the WebGestaltR and do not need to upload the reference set. All existing platform
supported in the WebGestaltR can be found by the function listReferenceSet
.
If referenceGeneFile
and refereneceGene
are \ codeNULL, WebGestaltR
will use the referenceSet
as the reference gene set. Otherwise,
WebGestaltR will use the user uploaded reference set for the enrichment analysis.
WebGestaltR will exclude the categories with the number of annotated genes
less than minNum
for the enrichment analysis. The default is 10
.
WebGestaltR will exclude the categories with the number of annotated genes
larger than maxNum
for the enrichment analysis. The default is 500
.
For the ORA method, WebGestaltR supports five FDR methods: holm
,
hochberg
, hommel
, bonferroni
, BH
and BY
.
The default is BH
.
Two significant methods are available in the WebGestaltR: fdr
and
top
. fdr
means the enriched categories are identified based
on the FDR and top
means all categories are ranked based on FDR and
then selected top categories as the enriched categories. The default is fdr
.
The significant level for the fdr
method. The default is 0.05
.
The threshold for the top
method. The default is 10
.
The number of enriched categories visualized in the final report.
The default is 20
and the maximum is 100
. A larger dNum
will increase the running time.
The number of permutations for the GSEA method. The default is 1000
.
The number of categories with the output leading edge genes for the GSEA method.
The default is 20
.
Note
: GSEA first ranks the categories based on NES (normalized
enrichment score) instead of FDR and then outputs the leading edge genes for
top lNum
categories. Because NES does not necessarily decrease with the
increase of the FDR, using sigMethod
defined in WebGestaltR to identify
the significant categories may cause some categories with outputted leading edge
genes are not included in the final result even if the number of significant categories
is larger than lNum
.
If is.output
is TRUE, WebGestaltR will create a folder named by the
projectName
and save the mapping results, GO slim summary, enrichment
results and an user-friendly HTML report in the folder. Otherwise, WebGestaltR will
only return an R data.frame
object containing the enrichment results.
If hundreds of gene list need to be analyzed simultaneous, it is better to set
is.output
as FALSE.
The output directory for the results.
The name of the project. If projectName
is Null, WebGestaltR will
use time stamp as the project name.
If keepGSEAFolder
is TRUE, WebGestaltR will keep all folders generated
from GSEA tool that contain all figures and tables related to the GSEA analysis.
For the large ID mapping table (e.g. dbSNP mapping table),
Users can use R
or Python
function to read it.
Sometimes Python
code is faster than the R
code.
If users use Python
code to read the mapping
table, users should first install python and the module pandas
in the computer.
If dagColor
is binary, the significant terms in the DAG structure will be colored by
red for ORA method or red (positive related) and blue (negative related) for GSEA
method. If dagColor
is continous, the significant terms in the DAG structure will
be colored by the red gradient for ORA method or red (positive related) and blue
(negative related) gradient for GSEA method.based on the corresponding FDR.
The server URL for accessing the data. User can use listArchiveURL
function
to get all archive version URL.
The WebGestaltR function not only outputs the user-friendly HTML report containing the ID mapping table, GO Slim summary result and the enrichment analysis result but also outputs an R object containing the enrichment analysis result.
# NOT RUN {
#######ORA example#########
geneFile<-system.file("extdata","interestingGenes.txt",package="WebGestaltR")
refFile<-system.file("extdata","referenceGenes.txt",package="WebGestaltR")
outputDirectory<-getwd()
#enrichResult<-WebGestaltR(enrichMethod="ORA",organism="hsapiens",
#enrichDatabase="pathway_KEGG",interestGeneFile=geneFile,
#interestGeneType="genesymbol",referenceGeneFile=refFile,
#referenceGeneType="genesymbol",is.output=TRUE,
#outputDirectory=outputDirectory,projectName=NULL)
#######GSEA example#########
#rankFile<-system.file("extdata","GeneRankList.rnk",package="WebGestaltR")
#outputDirectory<-getwd()
#enrichResult<-WebGestaltR(enrichMethod="GSEA",organism="hsapiens",
#enrichDatabase="pathway_KEGG",interestGeneFile=rankFile,
#interestGeneType="genesymbol", collapseMethod="mean",
#is.output=TRUE,outputDirectory=outputDirectory)
# }
Run the code above in your browser using DataLab