Learn R Programming

WebGestaltR (version 0.1.0)

WebGestaltR:

Description

WebGestaltR function can perform two popular enrichment analyses: ORA (Over-Representation Analysis) and GSEA (Gene Set Enrichment Analysis). Based on the user uploaded gene list or gene list with scores (for GSEA method), WebGestaltR function will first map the gene list to the entrez gene ids and then summary the gene list based on the GO (Gene Ontology) Slim. After performing the enrichment analysis, WebGestaltR function also returns an user-friendly HTML report containing the ID mapping table, GO Slim summary result and the enrichment analysis result. If the functional categories have the DAG (directed acyclic graph) structure, the structure of the enriched categories can also be visualized in the report.

Usage

WebGestaltR(enrichMethod="ORA", organism="hsapiens", 
enrichDatabase="geneontology_Biological_Process",enrichDatabaseFile=NULL, 
enrichDatabaseType=NULL,enrichDatabaseDescriptionFile=NULL,interestGeneFile=NULL, 
interestGene=NULL,interestGeneType=NULL,collapseMethod="mean",referenceGeneFile=NULL,
referenceGene=NULL,referenceGeneType=NULL,referenceSet=NULL, minNum=10, maxNum=500,
fdrMethod="BH",sigMethod="fdr",fdrThr=0.05,topThr=10,dNum=20,perNum=1000,
lNum=20,is.output=TRUE,outputDirectory=getwd(),projectName=NULL,keepGSEAFolder=FALSE,
methodType="R",dagColor="binary",hostName="http://www.webgestalt.org/")

Arguments

enrichMethod
WebGestaltR supports two enrichment analysis methods: ORA (Over-Representation Analysis) and GSEA (Gene Set Enrichment Analysis).
organism
Currently, WebGestaltR supports 12 organisms. Users can use the function listOrganism to check the available organisms. Users can also input others to perform the enrichment analysis based on other organisms not supported by WebGestaltR. For the other organisms, users need to upload the enrichment categories, interesting list and reference list (for ORA method). Because WebGestaltR does not perform the ID mapping for the other organisms, the above uploaded data should have the same ID type.
enrichDatabase
The functional categories for the enrichment analysis. Users can use the function listGeneset to check the available functional databases for the selected organism. Users can also input others to upload the functional database not supported by WebGestaltR for the selected organism.
enrichDatabaseFile
If users set organism as others or set enrichDatabase as others, users need to upload a GMT file as the functional categories for the enrichment analysis. The extension of the file should be gmt and the first column of the file is the category ID, the second one is the external link for the category. Genes annotated to the category are from the third column. All columns are separated by tab.
enrichDatabaseType
If users set enrichDatabase as others, WebGestaltR will also perform ID mapping for the uploaded GMT file. Thus, users need to set the ID type of the genes in the enrichDatabaseFile. If users set organism as others, users do not need to set this ID type because WebGestaltR will not perform ID mapping for other organisms. The supported ID type of the WebGestaltR for the selected organism can be found by the function listIDType.
enrichDatabaseDescriptionFile
Users can also upload a description file for the uploaded enrichDatabaseFile. The extension of the description file should be des. The description file contains two columns: the first column is the category ID that should be exactly the same as the category ID in the uploaded enrichDatabaseFile and the second column is the description of the category. All columns are separated by tab.
interestGeneFile
If enrichMethod is ORA, the extension of the interestGeneFile should be txt and the file can only contain one column: the interesting gene list. If enrichMethod is GSEA, the extension of the interestGeneFile should be rnk and the file should contain two columns separated by tab: the gene list and the corresponding scores.
interestGene
Users can also use the R object as the input. If enrichMethod is ORA, interestGene should be an R vector object containing the interesting gene list. If enrichMethod is GSEA, interestGene should be an R data.frame object containing two columns: the gene list and the corresponding scores.
interestGeneType
The ID type of the interesting gene list. The supported ID type of the WebGestaltR for the selected organism can be found by the function listIDType. If the organism is others, users do not need to set this parameter.
collapseMethod
The method to collapse the duplicate ids for the GSEA method. mean, median, min and max represent the mean, median, minimum and maximum of scores for the duplicate ids.
referenceGeneFile
For ORA method, the users need to upload the reference gene list. The extension of the referenceGeneFile should be txt and the file can only contain one column: the reference gene list.
referenceGene
For ORA method, users can also use the R object as the reference gene list. referenceGene should be an R vector object containing the reference gene list.
referenceGeneType
The ID type of the reference gene list. The supported ID type of the WebGestaltR for the selected organism can be found by the function listIDType. If the organism is others, users do not need to set this parameter.
referenceSet
Users can directly select the reference set from the existing platform in the WebGestaltR and do not need to upload the reference set. All existing platform supported in the WebGestaltR can be found by the function listReferenceSet. If referenceGeneFile and refereneceGene are \ codeNULL, WebGestaltR will use the referenceSet as the reference gene set. Otherwise, WebGestaltR will use the user uploaded reference set for the enrichment analysis.
minNum
WebGestaltR will exclude the categories with the number of annotated genes less than minNum for the enrichment analysis. The default is 10.
maxNum
WebGestaltR will exclude the categories with the number of annotated genes larger than maxNum for the enrichment analysis. The default is 500.
fdrMethod
For the ORA method, WebGestaltR supports five FDR methods: holm, hochberg, hommel, bonferroni, BH and BY. The default is BH.
sigMethod
Two significant methods are available in the WebGestaltR: fdr and top. fdr means the enriched categories are identified based on the FDR and top means all categories are ranked based on FDR and then selected top categories as the enriched categories. The default is fdr.
fdrThr
The significant level for the fdr method. The default is 0.05.
topThr
The threshold for the top method. The default is 10.
dNum
The number of enriched categories visualized in the final report. The default is 20 and the maximum is 100. A larger dNum will increase the running time.
perNum
The number of permutations for the GSEA method. The default is 1000.
lNum
The number of categories with the output leading edge genes for the GSEA method. The default is 20. Note: GSEA first ranks the categories based on NES (normalized enrichment score) instead of FDR and then outputs the leading edge genes for top lNum categories. Because NES does not necessarily decrease with the increase of the FDR, using sigMethod defined in WebGestaltR to identify the significant categories may cause some categories with outputted leading edge genes are not included in the final result even if the number of significant categories is larger than lNum.
is.output
If is.output is TRUE, WebGestaltR will create a folder named by the projectName and save the mapping results, GO slim summary, enrichment results and an user-friendly HTML report in the folder. Otherwise, WebGestaltR will only return an R data.frame object containing the enrichment results. If hundreds of gene list need to be analyzed simultaneous, it is better to set is.output as FALSE.
outputDirectory
The output directory for the results.
projectName
The name of the project. If projectName is Null, WebGestaltR will use time stamp as the project name.
keepGSEAFolder
If keepGSEAFolder is TRUE, WebGestaltR will keep all folders generated from GSEA tool that contain all figures and tables related to the GSEA analysis.
methodType
For the large ID mapping table (e.g. dbSNP mapping table), Users can use R or Python function to read it. Sometimes Python code is faster than the R code. If users use Python code to read the mapping table, users should first install python and the module pandas in the computer.
dagColor
If dagColor is binary, the significant terms in the DAG structure will be colored by red for ORA method or red (positive related) and blue (negative related) for GSEA method. If dagColor is continous, the significant terms in the DAG structure will be colored by the red gradient for ORA method or red (positive related) and blue (negative related) gradient for GSEA method.based on the corresponding FDR.
hostName
The server URL for accessing the data. User can use listArchiveURL function to get all archive version URL.

Value

The WebGestaltR function not only outputs the user-friendly HTML report containing the ID mapping table, GO Slim summary result and the enrichment analysis result but also outputs an R object containing the enrichment analysis result.

Examples

Run this code

#######ORA example#########
geneFile<-system.file("extdata","interestingGenes.txt",package="WebGestaltR")
refFile<-system.file("extdata","referenceGenes.txt",package="WebGestaltR")
outputDirectory<-getwd()
#enrichResult<-WebGestaltR(enrichMethod="ORA",organism="hsapiens",
#enrichDatabase="pathway_KEGG",interestGeneFile=geneFile, 
#interestGeneType="genesymbol",referenceGeneFile=refFile, 
#referenceGeneType="genesymbol",is.output=TRUE,
#outputDirectory=outputDirectory,projectName=NULL)
	
#######GSEA example#########
#rankFile<-system.file("extdata","GeneRankList.rnk",package="WebGestaltR")
#outputDirectory<-getwd()
#enrichResult<-WebGestaltR(enrichMethod="GSEA",organism="hsapiens",
#enrichDatabase="pathway_KEGG",interestGeneFile=rankFile,
#interestGeneType="genesymbol", collapseMethod="mean",
#is.output=TRUE,outputDirectory=outputDirectory)

Run the code above in your browser using DataLab