idenfify-methods: Identify methods

Description

Identify transcription start sites in sequence read count data.

Usage

identifyStartSites(x, threshold=1, tau=c(20, 20), neighbor=TRUE,
fun=subtractExpectation, multicore=TRUE, ...)

Arguments

Object of class TssNorm with normalized data.

threshold

Numeric with the minimal number of reads to be treated as a potential TSS.

tau

Numeric vector of length two specifying the $\sQuote{tau}$ parameter of the exponential function for each side of the segment. For the forward strand (“+”), the first and second value refer to the side towards the 5' and 3' end, respectively. In the case that a single value is provided it is applied to both sides.

neighbor

Logical whether the background estimates should be iteratively assigned to the predicted TSS during the estimation (default: TRUE).

fun

Function to calculate the expectation for each TSS. For details, see the ‘details’ section.

multicore

Logical whether to use the parallel package to speed up the computation. Has only an effect if the package is available and loaded. For details, see the ‘details’ section.

...

Additional arguments passed for the parallel package if used. For details, see the ‘details’ section.

Value

An object of class TssResult.

Methods

Identify TSS:

identifyStartSites:: signature(x="TssData")

Details

After normalization of the count data, an iterative algorithm is applied for each segment to identify the TSS.

The expected number of false positive counts is initialized with a default value given by the read frequency in the whole data set. The position with the largest counts above is identified as a TSS, if the expected transcription level is at least one read above the expected number of false positive reads. The transcription levels for all TSS are calculated by adding all counts to their nearest neighbor TSS.

Then, the expected number of false positive reads is updated by convolution with exponential kernels. The decay rates $tau$ in 3' direction and towards the 5'-end can be chosen differently to account for the fact that false positive counts are preferably found in 5' direction of a TSS. This procedure is iterated as long as the set of TSS increases.

In order to distribute the identification step over multiple processor cores, the mclapply function of the parallel package can be used. For this, the parallel package has to be loaded manually before starting the computation, additional parameters are passed via the ... argument, e.g.as normalizeCounts(x, mc.cores=2). The multicore argument can further be used to temporarily disable the parallel estimation by setting it to FALSE. Pleas note that the identification step is normally very fast and thus using parallel computation here may a minor impact as compared to the normalizeCounts method.

Examples

Run this code

## preceding steps
example(normalizeCounts)

## identify TSS
z <- identifyStartSites(yFit)

z

Run the code above in your browser using DataLab