mergeWindows(regions, tol, sign=NULL, max.width=NULL, ignore.strand=TRUE)
regions
id
, an integer vector containing the cluster ID for each window; and region
,
a GRanges object containing the start/stop coordinates for each cluster of windows.
regions
are merged if the gap between the end of one window and the start of the next is no greater than tol
.
Adjacent windows can then be chained together to build a cluster of windows across the linear genome.
A value of zero for tol
means that the windows must be contiguous whereas negative values specify minimum overlaps.If sign!=NULL
, windows are only merged if they have the same sign of the log-FC and are not separated by intervening windows with opposite log-FC values.
This can be useful to ensure consistent changes when summarizing adjacent DB regions.
However, it is not recommended for routine clustering in differential analyses as the resulting clusters will not be independent of the p-value.
Specification of max.width
prevents the formation of excessively large clusters when many adjacent regions are present.
Any cluster that is wider than max.width
is split into multiple subclusters of (roughly) equal size.
Specifically, the cluster interval is partitioned into the smallest number of equally-sized subintervals where each subinterval is smaller than max.width
.
Windows are then assigned to each subinterval based on the location of the window midpoints.
Suggested values range from 2000 to 10000 bp, but no limits are placed on the maximum size if it is NULL
.
The tolerance should reflect the minimum distance at which two regions of
enrichment are considered separate. If two windows are more than tol
apart, they will be placed into separate clusters. In contrast, the
max.width
value reflects the maximum distance at which two windows can be
considered part of the same region.
Arbitrary regions can also be used in this function.
However, caution is required if any fully nested regions are present.
Clustering with sign!=NULL
will lead to a warning as splitting by sign becomes undefined.
This is because any genomic region involving the parent window must contain the nested window, such that the cluster will always contain opposite log-fold changes.
Splitting with max.width!=NULL
will not fail, but cluster sizes may not be reduced if very large regions are present.
If ignore.strand=FALSE
, the entries in regions
are split into their separate strands.
The function is run separately on the entries for each strand, and the results collated.
The region
returned in the output will be stranded to reflect the strand of the contributing input regions.
This may be useful for strand-specific applications.
Note that, in the output, the cluster ID reported in id
corresponds to the index of the cluster coordinates in the input region
.
combineTests
, windowCounts
x <- round(runif(10, 100, 1000))
gr <- GRanges(rep("chrA", 10), IRanges(x, x+40))
mergeWindows(gr, 1)
mergeWindows(gr, 10)
mergeWindows(gr, 100)
mergeWindows(gr, 100, sign=rep(c(TRUE, FALSE), 5))
Run the code above in your browser using DataLab