Usage
xGR2GeneScores(data, significance.threshold = 5e-05, score.cap = 10,
build.conversion = c(NA, "hg38.to.hg19", "hg18.to.hg19"),
distance.max = 50000, decay.kernel = c("slow", "linear", "rapid",
"constant"), decay.exponent = 2, GR.Gene = c("UCSC_knownGene",
"UCSC_knownCanonical"), scoring.scheme = c("max", "sum", "sequential"),
verbose = T, RData.location = "http://galahad.well.ox.ac.uk/bigdata")
Arguments
data
a named input vector containing the sinificance level for
genomic regions (GR). For this named vector, the element names are GR,
in the format of 'chrN:start-end', where N is either 1-22 or X, start
(or end) is genomic positional number; for example, 'chr1:13-20'. The
element values for the significance level (measured as p-value or fdr).
Alternatively, it can be a matrix or data frame with two columns: 1st
column for GR, 2nd column for the significance level.
significance.threshold
the given significance threshold. By
default, it is set to NULL, meaning there is no constraint on the
significance level when transforming the significance level of GR into
scores. If given, those GR below this are considered significant and
thus scored positively. Instead, those above this are considered
insigificant and thus receive no score
score.cap
the maximum score being capped. By default, it is set
to 10. If NULL, no capping is applied
build.conversion
the conversion from one genome build to
another. The conversions supported are "hg38.to.hg19" and
"hg18.to.hg19". By default it is NA (no need to do so)
distance.max
the maximum distance between genes and GR. Only
those genes no far way from this distance will be considered as seed
genes. This parameter will influence the distance-component weights
calculated for nearby GR per gene
decay.kernel
a character specifying a decay kernel function. It
can be one of 'slow' for slow decay, 'linear' for linear decay, and
'rapid' for rapid decay. If no distance weight is used, please select
'constant'
decay.exponent
a numeric specifying a decay exponent. By
default, it sets to 2
GR.Gene
the genomic regions of genes. By default, it is
'UCSC_knownGene', that is, UCSC known genes (together with genomic
locations) based on human genome assembly hg19. It can be
'UCSC_knownCanonical', that is, UCSC known canonical genes (together
with genomic locations) based on human genome assembly hg19.
Alternatively, the user can specify the customised input. To do so,
first save your RData file (containing an GR object) into your local
computer, and make sure the GR object content names refer to Gene
Symbols. Then, tell "GR.Gene" with your RData file name (with or
without extension), plus specify your file RData path in
"RData.location". Note: you can also load your customised GR object
directly
scoring.scheme
the method used to calculate seed gene scores
under a set of GR. It can be one of "sum" for adding up, "max" for the
maximum, and "sequential" for the sequential weighting. The sequential
weighting is done via: $\sum_{i=1}{\frac{R_{i}}{i}}$, where
$R_{i}$ is the $i^{th}$ rank (in a descreasing order)
verbose
logical to indicate whether the messages will be
displayed in the screen. By default, it sets to true for display
RData.location
the characters to tell the location of built-in
RData files. See xRDataLoader
for details