scoreShift(object, groupX, groupY, testKS = TRUE, useTpmKS = TRUE, useMulticore = F, nrCores = NULL)
CAGEset
object
groupX
) and in the second group (groupY
). Shifting score for each consensus cluster will be calculated by comparing CAGE signal in the samples from groupX
against the signal in the samples from groupY
. If there is more than one CAGE dataset in the group, the datasets within that group will be merged together before comparison with the other group. See Details.
TRUE
) or raw tag counts (FALSE
) be used to derive sample sizes for Kolomogorov-Smirnov test. Used only when testKS = TRUE
, otherwise ignored. See Details.
useMulticore = TRUE
is supported only on Unix-like platforms.
useMulticore = TRUE
. Default value NULL
uses all detected cores.
shiftingGroupX
, shiftingGroupY
and consensusClustersShiftingScores
of the provided CAGEset
object will be occupied by the information on the groups of CAGE datasets that have been compared and shifting scores of all consensus clusters. Consensus clusters (promoters) with shifting score and/or FDR above specified threshold can be extracted by calling getShiftingPromoters
function.
score = max(F1 - F2) / max(F1)
where F1 is a cumulative sum of CAGE signal along consensus cluster in the group of samples with lower total signal in that consensus cluster, and F2 in the opposite group. Since cumulative sum can be calculated in both forward (5' -> 3') and reverse (3' -> 5') direction, shifting score is calculated for both cases and the bigger value is selected as final shifting score. Value of the shifting score is in the range [-Inf, 1]
, where value of 1
means complete physical separation of TSSs used in the two samples for given consensus cluster. In general, any non-negative value of the shifting score can be interpreted as the proportion of transcription initiation in the sample with lower expression that is happening "outside" (either upstream or downstream) of the region used for transcription initiation in the other sample. Negative values indicate no physical separation, i.e. the region used for transcription initiation in the sample with lower expression is completely contained within the region used for transcription initiation in the other sample.
In addition to shifting score which indicates only physical separation (upstream or downstream shift of TSSs), a more general assessment of differential TSS usage can be obtained by performing a two-sample Kolmogorov-Smirnov test on cumulative sums of CAGE signal along the consensus cluster. In that case, cumulative sums in both samples are scaled to range [0,1] and are considered to be empirical cumulative distribution functions (ECDF) reflecting sampling of TSS positions during transcription initiation. Kolmogorov-Smirnov test is performed to assess whether the two underlying probability distributions differ. To obtain P-value (i.e. the level at which the null-hypothesis can be rejected), sample sizes that generated the ECDFs are required, in addition to actual K-S statistics calculated from ECDFs. These are derived either from raw tag counts, i.e. exact number of times each TSS in the cluster was sampled during sequencing (when useTpmKS = FALSE
), or from normalized tpm values (when useTpmKS = TRUE
). P-values obtained from K-S tests are further adjusted for multiple testing using Benjamini & Hochberg (BH) method and for each P-value a corresponding false-discovery rate (FDR) is also reported.
Since calculation of shifting scores and Kolmogorov-Smirnov test require cumulative sums along consensus clusters, they have to be calculated beforehand by calling cumulativeCTSSdistribution
function.
cumulativeCTSSdistribution
load(system.file("data", "exampleCAGEset.RData", package="CAGEr"))
scoreShift(object = exampleCAGEset, groupX = c("sample1", "sample2"),
groupY = "sample3", testKS = TRUE, useTpmKS = FALSE)
Run the code above in your browser using DataLab