identifyStartSites(x, threshold=1, tau=c(20, 20), neighbor=TRUE,
fun=subtractExpectation, multicore=TRUE, ...)
TssNorm
with normalized data.TssResult
.
signature(x="TssData")
identifyStartSites(x, ...)
The expected number of false positive counts is initialized with a default value given by the read frequency in the whole data set. The position with the largest counts above is identified as a TSS, if the expected transcription level is at least one read above the expected number of false positive reads. The transcription levels for all TSS are calculated by adding all counts to their nearest neighbor TSS.
Then, the expected number of false positive reads is updated by convolution with exponential kernels. The decay rates $tau$ in 3' direction and towards the 5'-end can be chosen differently to account for the fact that false positive counts are preferably found in 5' direction of a TSS. This procedure is iterated as long as the set of TSS increases.
In order to distribute the identification step over multiple processor
cores, the mclapply
function of the parallel package can
be used. For this, the parallel package has to be loaded
manually before starting the computation, additional parameters are
passed via the ...
argument, e.g.as normalizeCounts(x,
mc.cores=2)
. The multicore
argument can further be used to
temporarily disable the parallel estimation by setting it to
FALSE
. Pleas note that the identification step is normally very
fast and thus using parallel computation here may a minor impact
as compared to the normalizeCounts
method.
TssData
, TssNorm
,
TssResult
Methods:
segmentizeCounts
, normalizeCounts
,
identifyStartSites
, get-methods
,
plot-methods
, asRangedData-methods
Functions:
subtract-functions
Data set:
physcoCounts
Package:
TSSi-package
## preceding steps
example(normalizeCounts)
## identify TSS
z <- identifyStartSites(yFit)
z
Run the code above in your browser using DataLab