Usage
runseq2pathway(inputfile, search_radius=150000, promoter_radius=200, promoter_radius2=100, genome=c("hg38","hg19","mm10","mm9"), adjacent=FALSE, SNP= FALSE, PromoterStop=FALSE, NearestTwoDirection=TRUE,UTR3=FALSE, DataBase=c("GOterm"), FAIMETest=FALSE, FisherTest=TRUE, collapsemethod=c("MaxMean","function","ME", "maxRowVariance","MinMean","absMinMean","absMaxMean","Average"), alpha=5, logCheck=FALSE, B=100, na.rm=FALSE, min_Intersect_Count=5)
Arguments
inputfile
An R object input file that records genomic region information (coordinates).
The file format could be data frame defined as:
- column 1
the unique IDs of genomic regions of interest (peaks, mutations, or SNPs)
- column 2
the chromosome IDs (eg. chr5 or 5)
- column 3
the start of genomic regions
- column 4
the end of genomic regions (for SNP and point mutations, the difference of start and end is 1bp)
- column 5...
Other custom defined information (option)
Or, the input format should be GRanges object(from R package GenomicRanges) with value column.
- column 1: space
the chromosome IDs (eg. chr5 or 5)
- column 2: ranges
the ranges of genomic regions
- column 3: name
the unique IDs of genomic regions of interest (peaks, mutations, or SNPs)
- more columns:
Other custom defined information (optional)
search_radius
A non-negative integer, with which the input genomic regions can be assigned not only
to the matched or nearest gene, but also with all genes within a search radius for
some genomic region type. This parameter works only when the parameter "SNP" is FALSE. Default is 150000.
promoter_radius
A non-negative integer. Default is 200. Promoters are here defined as upstream regions of
the transcription start sites (TSS). User can assign the promoter radius, a suggested value is between 200 to 2000.
promoter_radius2
A non-negative integer. Default is 100. Promoters are here defined as downstream regions
after the transcription start sites (TSS).
genome
A character specifies the genome type. Currently, choice of "hg38", "hg19", "mm10", and "mm9" is supported.
adjacent
A Boolean. Default is FALSE to search all genes within the search_radius. Using "TRUE" to find
the adjacent genes only and ignore the parameters "SNP" and "search_radius".
SNP
A Boolean specifies the input object type. FALSE by default to keep on searching for intron and
neighboring genes. Otherwise, runseq2gene stops searching when the input genomic region is residing on exon of a coding gene.
PromoterStop
A Boolean, "FALSE" by default to keep on searching neighboring genes using the parameter "search_radius".
Otherwise, runseq2gene stops searching neighboring genes. This parameter has function only if an input genomic region maps to promoter of coding gene(s).
NearestTwoDirection
A boolean, "TRUE" by default to output the closest left and closest right coding genes with directions.
Otherwise, output only the nearest coding gene regardless of direction.
UTR3
A boolean, "FALSE" by defalt to calculate the distance from genes' 5UTR. Otherwsie, calculate the distance from genes' 3UTR.
DataBase
A character string assigns an R GSA.genesets object to define gene-set. User can call GSA.read.gmt to
load customized gene-sets with a .gmt format.
If not specified, a character "GOterm" by default, three categories of GO-defined gene sets (BP,MF,CC) will be used.
Alternatively, user can specify a category by the choice of "BP","MF","CC".
FAIMETest
A boolean values. By default is FALSE. When true, executes function of gene2pathway test using the FAIME method,
which only functions when the fifth column of input file exsists and is a vector of scores or values.
FisherTest
A Boolean value. By default is TRUE to excute the function of the Fisher's exact test. Otherwise,
only excutes the function of gene2pathway test.
collapsemethod
A character for determining which method to use when call the function collapseRows in package WGCNA.
The function "collapsemethod" uses this paramter to call the collapseRows() function in package "WGCNA".
alpha
A positive integer, 5 by default. This is a FAIME-specific parameter. A higher value puts more weights on
the most highly-expressed ranks than the lower expressed ranks.
logCheck
A Boolean value. By default is FALSE. When true, the function takes the log-transformed values of gene
if the maximum value of sample profile is larger than 20.
na.rm
A Boolean value indicates whether to keep missing values or not when method="FAIME". By default is FALSE.
B
A positive integer assigns the total number of random sampling trials to calculate the empirical pvalues.
By default is 100.
min_Intersect_Count
A number decides the cutoff of the minimum number of intersected genes when reporting Fisher's exact tested results.