WebGestaltR_batch(interestGeneFolder=NULL,interestGeneType=NULL,enrichMethod="ORA",
organism="hsapiens",enrichDatabase="geneontology_Biological_Process",
enrichDatabaseFile=NULL,enrichDatabaseType=NULL,enrichDatabaseDescriptionFile=NULL,
collapseMethod="mean",referenceGeneFile=NULL,referenceGene=NULL,referenceGeneType=NULL,
referenceSet=NULL, minNum=10, maxNum=500,fdrMethod="BH",sigMethod="fdr",fdrThr=0.05,
topThr=10,dNum=20,perNum=1000,lNum=20,is.output=TRUE,outputDirectory=getwd(),
keepGSEAFolder=FALSE,methodType="R",hostName="http://www.webgestalt.org/",
is_parallel=FALSE,nThreads=3)enrichMethod is
ORA, the extension of all files should be txt and each file can only
contain one column: the interesting gene list. If enrichMethod is GSEA,
the extension of each file should be rnk and the file should contain
two columns separated by tab: the gene list and the corresponding scores.
listIDType. If the
organism is others, users do not need to set this parameter.
NOTE: the ID type in all files should be the same.
listOrganism to check the available organisms.
Users can also input others to perform the enrichment analysis based on other
organisms not supported by WebGestaltR. For the other organisms, users need to upload
the enrichment categories, interesting list and reference list (for ORA method).
Because WebGestaltR does not perform the ID mapping for the other organisms,
the above uploaded data should have the same ID type.
listGeneset to check the available functional
databases for the selected organism. Users can also input others to
upload the functional database not supported by WebGestaltR for the selected organism.
organism as others or set enrichDatabase
as others, users need to upload a GMT file as the functional categories
for the enrichment analysis. The extension of the file should be gmt and
the first column of the file is the category ID, the second one is the external link
for the category. Genes annotated to the category are from the third column.
All columns are separated by tab.
enrichDatabase as others, WebGestaltR will also perform
ID mapping for the uploaded GMT file. Thus, users need to set the ID type of the
genes in the enrichDatabaseFile. If users set organism as others,
users do not need to set this ID type because WebGestaltR will not perform ID mapping
for other organisms. The supported ID type of the WebGestaltR for the selected organism
can be found by the function listIDType.
enrichDatabaseFile.
The extension of the description file should be des. The description file
contains two columns: the first column is the category ID that should be exactly
the same as the category ID in the uploaded enrichDatabaseFile and the second
column is the description of the category. All columns are separated by tab.
mean,
median, min and max represent the mean, median, minimum
and maximum of scores for the duplicate ids.
referenceGeneFile should be txt and the file can only
contain one column: the reference gene list.
referenceGene should be an R vector object containing the
reference gene list.
listIDType.
If the organism is others, users do not need to set this parameter.
listReferenceSet.
If referenceGeneFile and refereneceGene are \ codeNULL, WebGestaltR
will use the referenceSet as the reference gene set. Otherwise,
WebGestaltR will use the user uploaded reference set for the enrichment analysis.
minNum for the enrichment analysis. The default is 10.
maxNum for the enrichment analysis. The default is 500.
holm,
hochberg, hommel, bonferroni, BH and BY.
The default is BH.
fdr and
top. fdr means the enriched categories are identified based
on the FDR and top means all categories are ranked based on FDR and
then selected top categories as the enriched categories. The default is fdr.
fdr method. The default is 0.05.
top method. The default is 10.
20 and the maximum is 100. A larger dNum
will increase the running time.
1000.
20.
Note: GSEA first ranks the categories based on NES (normalized
enrichment score) instead of FDR and then outputs the leading edge genes for
top lNum categories. Because NES does not necessarily decrease with the
increase of the FDR, using sigMethod defined in WebGestaltR to identify
the significant categories may cause some categories with outputted leading edge
genes are not included in the final result even if the number of significant categories
is larger than lNum.
is.output is TRUE, WebGestaltR will create a folder named by the
projectName and save the mapping results, GO slim summary, enrichment
results and an user-friendly HTML report in the folder. Otherwise, WebGestaltR will
only return an R data.frame object containing the enrichment results.
If hundreds of gene list need to be analyzed simultaneous, it is better to set
is.output as FALSE.
keepGSEAFolder is TRUE, WebGestaltR will keep all folders generated
from GSEA tool that contain all figures and tables related to the GSEA analysis.
R or Python function to read it.
Sometimes Python code is faster than the R
code.
If users use Python code to read the mapping
table, users should first install python and the module pandas
in the computer.
listArchiveURL function
to get all archive version URL.
is_parallel is TRUE, WebGestaltR_batch will use parallel computing to
simultaneously analyze the lists in all files.
is.output is TRUE, each enriched result will be saved in a folder with
the name containing the input file name under the outputDirectory. Otherwise,
the WebGestaltR_batch function will return a list object containing all results.If there are errors during the calculation, error message can also be found in the returned list object.
###interestGeneFolder contains multiple .txt files for ORA analysis
refFile<-system.file("extdata","referenceGenes.txt",package="WebGestaltR")
outputDirectory<-getwd()
#enrichResult<-WebGestaltR_batch(interestGeneFolder=interestGeneFolder,
#interestGeneType="genesymbol,enrichMethod="ORA",organism="hsapiens",
#enrichDatabase="pathway_KEGG",referenceGeneFile=refFile,
#referenceGeneType="genesymbol",is.output=TRUE,
#outputDirectory=outputDirectory,is_parallel=FALSE)
Run the code above in your browser using DataLab