normalizeToMatrix(signal, target, extend = 5000, w = max(extend)/50,
value_column = NULL, mapping_column = NULL, empty_value = ifelse(smooth, NA, 0),
mean_mode = c("absolute", "weighted", "w0", "coverage"), include_target = any(width(target) > 1),
target_ratio = ifelse(all(extend == 0), 1, 0.1), k = min(c(20, min(width(target)))),
smooth = FALSE, smooth_fun = default_smooth_fun, trim = 0)
GRanges
object.GRanges
object.target
. It can be a vector of length one or two. If it is length one, it means extension to the upstream and downstream are the same.signal
that will be mapped to colors. If it is NULL
, an internal column which all contains 1 will be used.signal
and target
. By default it tries to look for all regions in signal
that overlap with every target.signal
.signal
, how to summarize values to this window. See 'Details' section for a detailed explanation.target
in the heatmap. If the width of all regions in target
is 1, include_target
is enforced to FALSE
.target
in the full heatmap. If the value is 1, extend
will be reset to 0.target_ratio = 1
or extend == 0
, otherwise ignored.NA
values) and returns a vector with same length. If the smoothing is failed, the function should call stop
to throw errors so that normalizeToMatrix
can catch how many rows are failed in smoothing. See the default default_smooth_fun
for example.c(0.01, 0.01)
means to trim outliers less than 1st quantile and larger than 99th quantile.[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The matrix is wrapped into a simple normalizeToMatrix
class.
signal
and target
, the data is transformed into a matrix
and visualized as a heatmap by EnrichedHeatmap
afterwards.Upstream and downstream also with the target body are splitted into a list of small windows and overlap
to signal
. Since regions in signal
and small windows do not always 100 percent overlap, there are four different average modes:
Following illustrates different settings for mean_mode
(note there is one signal region overlapping with other signals):
40 50 20 values in signal ++++++ +++ +++++ signal 30 values in signal ++++++ signal ================= window (17bp), there are 4bp not overlapping to any signal region. 4 6 3 3 overlap
absolute: (40 + 30 + 50 + 20)/4 weighted: (40*4 + 30*6 + 50*3 + 20*3)/(4 + 6 + 3 + 3) w0: (40*4 + 30*6 + 50*3 + 20*3)/(4 + 6 + 3 + 3 + 4) coverage: (40*4 + 30*6 + 50*3 + 20*3)/17
To explain it more clearly, let's consider three scenarios:
First, we want to calculate mean methylation from 3 CpG sites in a 20bp window. Since methylation
is only measured at CpG site level, the mean value should only be calculated from the 3 CpG sites while not the non-CpG sites. In this
case, absolute
mode should be used here.
Second, we want to calculate mean coverage in a 20bp window. Let's assume coverage is 5 in 1bp ~ 5bp, 10 in 11bp ~ 15bp and 20 in 16bp ~ 20bp.
Since converage is kind of attribute for all bases, all 20 bp should be taken into account. Thus, here w0
mode should be used
which also takes account of the 0 coverage in 6bp ~ 10bp. The mean coverage will be caculated as (5*5 + 10*5 + 20*5)/(5+5+5+5)
.
Third, genes have multiple transcripts and we want to calculate how many transcripts eixst in a certain position in the gene body.
In this case, values associated to each transcript are binary (either 1 or 0) and coverage
mean mode should be used.
signal = GRanges(seqnames = "chr1",
ranges = IRanges(start = c(1, 4, 7, 11, 14, 17, 21, 24, 27),
end = c(2, 5, 8, 12, 15, 18, 22, 25, 28)),
score = c(1, 2, 3, 1, 2, 3, 1, 2, 3))
target = GRanges(seqnames = "chr1", ranges = IRanges(start = 10, end = 20))
normalizeToMatrix(signal, target, extend = 10, w = 2)
normalizeToMatrix(signal, target, extend = 10, w = 2, include_target = TRUE)
normalizeToMatrix(signal, target, extend = 10, w = 2, value_column = "score")
Run the code above in your browser using DataLab