Usage
runSeqGSEA(data.dir, case.pattern, ctrl.pattern, geneset.file, output.prefix, topGS=10, geneID.type=c("gene.symbol", "ensembl"), nCores=1, perm.times=1000, seed=NULL, minExonReadCount=5, integrationMethod=c("linear", "quadratic", "rank"), DEweight=c(0.5), DEonly=FALSE, minGSsize=5, maxGSsize=1000, GSEA.WeightedType=1)
Arguments
data.dir
a character vector, the path to your count data directory.
case.pattern
a character vector, the unique pattern in the file names of case samples.
E.g, if file names starting with "SC", the pattern writes "^SC".
ctrl.pattern
a character vector, the unique pattern in the file names of control samples.
geneset.file
a character vector, the path to your gene set file. The gene set file must be in
GMT format. Please refer to the link follows for details.
http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29
output.prefix
a character vector, the path with prefix for output files.
topGS
an integer, this number of top ranked gene sets will be output with details; if geneset.file contains less than this number of gene sets, all gene sets' result details will be output. Default: 10.
geneID.type
the gene ID type in geneset.file. Currently only support "gene.symbol" and "ensembl". Default: gene.symbol.
nCores
an integer. The number of cores for running SeqGSEA. Default: 1
perm.times
an integer. The number of times for permutation, which will be used for normalizing DE and DS scores and for GSEA significance analysis.
Recommended values are greater than 1000. Default: 1000.
seed
an integer or NULL, used for setting the seeds to generate random numbers. The same seed will guarantee the same analysis results given by SeqGSEA. Default: NULL.
minExonReadCount
an integer. An exon with total read count across all samples less than this number will be marked as untestable and be excluded in SeqGSEA analysis. Default: 5.
integrationMethod
one of the three integration methods for DE and DS score integration: linear, quadratic, or rank. Default: linear.
DEweight
a real number between 0 and 1 OR a vector of those. Each number is the DE weight in DE and DS integration. If using a vector of real numbers, SeqGSEA will run with each of them individually. Default: 0.5.
DEonly
logical, whether to run SeqGSEA only considering DE. Default: FALSE
minGSsize
an integer. The minimum gene set size: gene sets with genes less than this number will be skipped. Default: 5.
maxGSsize
an integer. The maximum gene set size: gene sets with genes greater than this number will be skipped. Default: 1000.
GSEA.WeightedType
the weight type of the main GSEA algorithm, can be 0 (unweighted = Kolmogorov-Smirnov), 1 (weighted), and 2 (over-weighted). Default: 1. It is recommended not to change it.