Main function for enrichment analysis
WebGestaltR(
enrichMethod = "ORA",
organism = "hsapiens",
enrichDatabase = NULL,
enrichDatabaseFile = NULL,
enrichDatabaseType = NULL,
enrichDatabaseDescriptionFile = NULL,
interestGeneFile = NULL,
interestGene = NULL,
interestGeneType = NULL,
collapseMethod = "mean",
referenceGeneFile = NULL,
referenceGene = NULL,
referenceGeneType = NULL,
referenceSet = NULL,
minNum = 10,
maxNum = 500,
sigMethod = "fdr",
fdrMethod = "BH",
fdrThr = 0.05,
topThr = 10,
reportNum = 20,
perNum = 1000,
gseaP = 1,
isOutput = TRUE,
outputDirectory = getwd(),
projectName = NULL,
dagColor = "continuous",
saveRawGseaResult = FALSE,
gseaPlotFormat = c("png", "svg"),
setCoverNum = 10,
networkConstructionMethod = NULL,
neighborNum = 10,
highlightType = "Seeds",
highlightSeedNum = 10,
nThreads = 1,
cache = NULL,
hostName = "https://www.webgestalt.org/",
...
)WebGestaltRBatch(
interestGeneFolder = NULL,
enrichMethod = "ORA",
isParallel = FALSE,
nThreads = 3,
...
)
The WebGestaltR function returns a data frame containing the enrichment analysis
result and also outputs an user-friendly HTML report if isOutput
is TRUE
.
The columns in the data frame depend on the enrichMethod
and they are the following:
ID of the gene set.
Description of the gene set if available.
Link to the data source.
The number of genes in the set after filtering by minNum
and maxNum
.
The number of mapped input genes that are annotated in the gene set.
Expected number of input genes that are annotated in the gene set.
Enrichment ratio, overlap / expect.
Enrichment score, the maximum running sum of scores for the ranked list.
Normalized enrichment score, normalized against the average enrichment score of all permutations.
Number of genes/phosphosites in the leading edge.
P-value from hypergeometric test for ORA. For GSEA, please refer to its original publication or online at https://software.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm.
Corrected P-value for mulilple testing with fdrMethod
for ORA.
The gene/phosphosite IDs of overlap
for ORA (entrez gene IDs or
phosphosite sequence).
Genes/phosphosites in the leading edge in entrez gene ID or phosphosite sequence.
The gene/phosphosite IDs of overlap
for ORA or leadingEdgeId
for GSEA in User input IDs.
Path of the GSEA enrichment plot.
Name of the source database if multiple enrichment databases are given.
In NTA, like geneSet
, the enriched GO terms of genes in the
returned subnetwork.
In NTA, the gene IDs in the subnetwork with 0/1 annotations indicating if it is from user input.
The WebGestaltRBatch function returns a list of enrichment results.
Enrichment methods: ORA
, GSEA
or NTA
.
Currently, WebGestaltR supports 12 organisms. Users can use the function
listOrganism
to check available organisms. Users can also input others
to
perform the enrichment analysis for other organisms not supported by WebGestaltR. For
other organisms, users need to provide the functional categories, interesting list and
reference list (for ORA method). Because WebGestaltR does not perform the ID mapping for
the other organisms, the above data should have the same ID type.
The functional categories for the enrichment analysis. Users can use
the function listGeneSet
to check the available functional databases for the
selected organism. Multiple databases in a vector are supported for ORA and GSEA.
Users can provide one or more GMT files as the functional
category for enrichment analysis. The extension of the file should be gmt
and the
first column of the file is the category ID, the second one is the external link for the
category. Genes annotated to the category are from the third column. All columns are
separated by tabs. The GMT files will be combined with enrichDatabase
.
The ID type of the genes in the enrichDatabaseFile
.
If users set organism
as others
, users do not need to set this ID type because
WebGestaltR will not perform ID mapping for other organisms. The supported ID types of
WebGestaltR for the selected organism can be found by the function listIdType
.
Users can also provide description files for the custom
enrichDatabaseFile
. The extension of the description file should be des
. The
description file contains two columns: the first column is the category ID that should be
exactly the same as the category ID in the custom enrichDatabaseFile
and the second
column is the description of the category. All columns are separated by tabs.
If enrichMethod
is ORA
or NTA
, the extension of
the interestGeneFile
should be txt
and the file can only contain one column:
the interesting gene list. If enrichMethod
is GSEA
, the extension of the
interestGeneFile
should be rnk
and the file should contain two columns
separated by tab: the gene list and the corresponding scores.
Users can also use an R object as the input. If enrichMethod
is
ORA
or NTA
, interestGene
should be an R vector
object
containing the interesting gene list. If enrichMethod
is GSEA
,
interestGene
should be an R data.frame
object containing two columns: the
gene list and the corresponding scores.
The ID type of the interesting gene list. The supported ID types of
WebGestaltR for the selected organism can be found by the function listIdType
. If
the organism
is others
, users do not need to set this parameter.
The method to collapse duplicate IDs with scores. mean
,
median
, min
and max
represent the mean, median, minimum and maximum
of scores for the duplicate IDs.
For the ORA method, the users need to upload the reference gene
list. The extension of the referenceGeneFile
should be txt
and the file can
only contain one column: the reference gene list.
For the ORA method, users can also use an R object as the reference
gene list. referenceGene
should be an R vector
object containing the
reference gene list.
The ID type of the reference gene list. The supported ID types
of WebGestaltR for the selected organism can be found by the function listIdType
.
If the organism
is others
, users do not need to set this parameter.
Users can directly select the reference set from existing platforms in
WebGestaltR and do not need to provide the reference set through referenceGeneFile
.
All existing platforms supported in WebGestaltR can be found by the function
listReferenceSet
. If referenceGeneFile
and refereneceGene
are
NULL
, WebGestaltR will use the referenceSet
as the reference gene set.
Otherwise, WebGestaltR will use the user supplied reference set for enrichment analysis.
WebGestaltR will exclude the categories with the number of annotated genes
less than minNum
for enrichment analysis. The default is 10
.
WebGestaltR will exclude the categories with the number of annotated genes
larger than maxNum
for enrichment analysis. The default is 500
.
Two methods of significance are available in WebGestaltR: fdr
and
top
. fdr
means the enriched categories are identified based on the FDR and
top
means all categories are ranked based on FDR and then select top categories
as the enriched categories. The default is fdr
.
For the ORA method, WebGestaltR supports five FDR methods: holm
,
hochberg
, hommel
, bonferroni
, BH
and BY
. The default
is BH
.
The significant threshold for the fdr
method. The default is 0.05
.
The threshold for the top
method. The default is 10
.
The number of enriched categories visualized in the final report. The default
is 20
. A larger reportNum
may be slow to render in the report.
The number of permutations for the GSEA method. The default is 1000
.
The exponential scaling factor of the phenotype score. The default is 1
.
When p=0, ES reduces to standard K-S statistics (See original paper for more details).
If isOutput
is TRUE, WebGestaltR will create a folder named by
the projectName
and save the results in the folder. Otherwise, WebGestaltR will
only return an R data.frame
object containing the enrichment results. If
hundreds of gene list need to be analyzed simultaneously, it is better to set
isOutput
to FALSE
. The default is TRUE
.
The output directory for the results.
The name of the project. If projectName
is NULL
,
WebGestaltR will use time stamp as the project name.
If dagColor
is binary
, the significant terms in the DAG
structure will be colored by steel blue for ORA method or steel blue (positive related)
and dark orange (negative related) for GSEA method. If dagColor
is continous
,
the significant terms in the DAG structure will be colored by the color gradient based on
corresponding FDRs.
Whether the raw result from GSEA is saved as a RDS file, which can be
used for plotting. Defaults to FALSE
. The list includes
A data frame of GSEA results with statistics
A matrix of running sum of scores for each gene set
A list with ranks of genes for each gene set
The graphic format of GSEA enrichment plots. Either svg
,
png
, or c("png", "svg")
(default).
The number of expected gene sets after set cover to reduce redundancy.
It could get fewer sets if the coverage reaches 100%. The default is 10
.
Netowrk construction method for NTA. Either
Network_Retrieval_Prioritization
or Network_Expansion
. Network Retrieval &
Prioritization first uses random walk analysis to calculate random walk probabilities
for the input seeds, then identifies the relationships among the seeds in the selected
network and returns a retrieval sub-network. The seeds with the top random walk
probabilities are highlighted in the sub-network. Network Expansion first uses random
walk analysis to rank all genes in the selected network based on their network
proximity to the input seeds and then return an expanded sub-network in which nodes
are the input seeds and their top ranking neighbors and edges represent their
relationships.
The number of neighbors to include in NTA Network Expansion method.
The type of nodes to highlight in the NTA Network Expansion method,
either Seeds
or Neighbors
.
The number of top input seeds to highlight in NTA Network Retrieval & Prioritizaiton method.
The number of cores to use for GSEA and set cover, and in batch function.
A directory to save data cache for reuse. Defaults to NULL
and disabled.
The server URL for accessing data. Mostly for development purposes.
In batch function, passes parameters to WebGestaltR function. Also handles backward compatibility for some parameters in old versions.
Run WebGestaltR for gene list files in the folder.
If jobs are run parallelly in the batch.
WebGestaltR function can perform three enrichment analyses: ORA (Over-Representation Analysis) and GSEA (Gene Set Enrichment Analysis).and NTA (Network Topology Analysis). Based on the user-uploaded gene list or gene list with scores, WebGestaltR function will first map the gene list to the entrez gene ids and then summarize the gene list based on the GO (Gene Ontology) Slim. After performing the enrichment analysis, WebGestaltR function also returns a user-friendly HTML report containing GO Slim summary and the enrichment analysis result. If functional categories have DAG (directed acyclic graph) structure or genes in the functional categories have network structure, those relationship can also be visualized in the report.
if (FALSE) {
####### ORA example #########
geneFile <- system.file("extdata", "interestingGenes.txt", package="WebGestaltR")
refFile <- system.file("extdata", "referenceGenes.txt", package="WebGestaltR")
outputDirectory <- getwd()
enrichResult <- WebGestaltR(enrichMethod="ORA", organism="hsapiens",
enrichDatabase="pathway_KEGG", interestGeneFile=geneFile,
interestGeneType="genesymbol", referenceGeneFile=refFile,
referenceGeneType="genesymbol", isOutput=TRUE,
outputDirectory=outputDirectory, projectName=NULL)
####### GSEA example #########
rankFile <- system.file("extdata", "GeneRankList.rnk", package="WebGestaltR")
outputDirectory <- getwd()
enrichResult <- WebGestaltR(enrichMethod="GSEA", organism="hsapiens",
enrichDatabase="pathway_KEGG", interestGeneFile=rankFile,
interestGeneType="genesymbol", sigMethod="top", topThr=10, minNum=5,
outputDirectory=outputDirectory)
####### NTA example #########
enrichResult <- WebGestaltR(enrichMethod="NTA", organism="hsapiens",
enrichDatabase="network_PPI_BIOGRID", interestGeneFile=geneFile,
interestGeneType="genesymbol", sigMethod="top", topThr=10,
outputDirectory=getwd(), highlightSeedNum=10,
networkConstructionMethod="Network_Retrieval_Prioritization")
}
Run the code above in your browser using DataLab