NOTE: This man page describes the methods that operate on Ranges,
Views, RangesList, or ViewsList objects. See
?`findOverlaps,GenomicRanges,GenomicRanges-method`
in the GenomicRanges package for methods that operate on
GenomicRanges or GRangesList
objects.
findOverlaps(query, subject, maxgap=0L, minoverlap=1L, type=c("any", "start", "end", "within", "equal"), select=c("all", "first", "last", "arbitrary"), algorithm=c("nclist", "intervaltree"), ...)
countOverlaps(query, subject, maxgap=0L, minoverlap=1L, type=c("any", "start", "end", "within", "equal"), algorithm=c("nclist", "intervaltree"), ...)
overlapsAny(query, subject, maxgap=0L, minoverlap=1L, type=c("any", "start", "end", "within", "equal"), algorithm=c("nclist", "intervaltree"), ...)
query %over% subject
query %within% subject
query %outside% subject
subsetByOverlaps(query, subject, maxgap=0L, minoverlap=1L, type=c("any", "start", "end", "within", "equal"), algorithm=c("nclist", "intervaltree"), ...)
mergeByOverlaps(query, subject, ...)
"ranges"(x, query, subject)
subject
is a Ranges object, query
can be an integer vector to be converted to length-one ranges. If query
is a RangesList or RangedData,
subject
must be a RangesList or RangedData.
If both lists have names, each element from the subject is paired
with the element from the query with the matching name, if any.
Otherwise, elements are paired by position. The overlap is then
computed between the pairs as described below.
If subject
is omitted, query
is queried against
itself. In this case, and only this case, the ignoreSelf
and ignoreRedundant
arguments are allowed. By default,
the result will contain hits for each range against itself, and if
there is a hit from A to B, there is also a hit for B to A. If
ignoreSelf
is TRUE
, all self matches are dropped. If
ignoreRedundant
is TRUE
, only one of A->B and B->A
is returned.
maxgap
or less and a minimum
of minoverlap
overlapping positions, allowing for
maxgap
, are considered to be overlapping. maxgap
should be a scalar, non-negative, integer. minoverlap
should be a scalar, positive integer.
type
parameter, one can select for specific types of overlap. The types
correspond to operations in Allen's Interval Algebra (see
references). If type
is start
or end
, the
intervals are required to have matching starts or ends,
respectively. While this operation seems trivial, the naive
implementation using outer
would be much less
efficient. Specifying equal
as the type returns the
intersection of the start
and end
matches. If
type
is within
, the query interval must be wholly
contained within the subject interval. Note that all matches must
additionally satisfy the minoverlap
constraint described above. With the old findOverlaps/countOverlaps implementation based on Interval
Trees (algorithm="intervaltree"
), the maxgap
parameter has
special meaning with the special overlap types. For start
,
end
, and equal
, it specifies the maximum difference in
the starts, ends or both, respectively. For within
, it is the
maximum amount by which the query may be wider than the subject.
With the new implementation based on Nested Containment Lists
(algorithm="nclist"
), this special meaning is still being used but
only when type
is set to start
or end
.
See ?NCList
for more information about this change and other
differences between the new and old findOverlaps/countOverlaps
implementations.
See the algorithm
argument below for how to switch between the
2 implementations.
query
is a Ranges or Views object:
When select
is "all"
(the default), the results are
returned as a Hits object.
Otherwise the returned value is an integer vector parallel to query
(i.e. same length) containing the first, last, or arbitrary overlapping
interval in subject
, with NA
indicating intervals that did
not overlap any intervals in subject
. If query
is a RangesList, ViewsList, or
RangedData object:
When select
is "all"
(the default), the results are
returned as a HitsList object.
Otherwise the returned value depends on the drop
argument.
When select != "all" && !drop
, an IntegerList is returned,
where each element of the result corresponds to a space in query
.
When select != "all" && drop
, an integer vector is returned
containing indices that are offset to align with the unlisted query
.
"nclist"
(the default) or "intervaltree"
.
This argument was added in BioC 3.1 to facilitate the transition
between the new findOverlaps/countOverlaps implementation based on Nested
Containment Lists and the old implementation based on Interval Trees.
See ?NCList
and ?IntervalTree
for more
information about these implementations..
Note that the old implementation is defunct starting with BioC 3.2.
The algorithm
argument will be removed in BioC 3.3.
drop
: Supported only when query
is a
RangesList, ViewsList, or RangedData object.
FALSE
by default. See select
argument above for the
details.
ignoreSelf
, ignoreRedundant
: When subject
is omitted, the ignoreSelf
and ignoreRedundant
arguments (both FALSE
by default) are allowed.
See query
and subject
arguments above for the
details.
findOverlaps
.
findOverlaps
: see select
argument above.For countOverlaps
: the overlap hit count for each range
in query
using the specified findOverlaps
parameters.
For RangesList objects, it returns an IntegerList object.overlapsAny
finds the ranges in query
that overlap any
of the ranges in subject
. For Ranges or Views
objects, it returns a logical vector of length equal to the number of
ranges in query
. For RangesList, RangedData, or
ViewsList objects, it returns a LogicalList object,
where each element of the result corresponds to a space in query
.%over%
and %within%
are convenience wrappers for the
2 most common use cases. Currently defined as
`%over%` <- function(query, subject) overlapsAny(query, subject)
and
`%within%` <- function(query, subject)
overlapsAny(query, subject,
type="within")
. %outside%
is simply the inverse of %over%
.subsetByOverlaps
returns the subset of query
that
has an overlap hit with a range in subject
using the specified
findOverlaps
parameters.mergeByOverlaps
computes the overlap between query and subject
according to the arguments in ...
. It then extracts the
corresponding hits from each object and returns a DataFrame
containing one column for the query and one for the subject, as well
as any mcols
that were present on either object. The query and
subject columns are named by quoting and deparsing the corresponding
argument.ranges(x, query, subject)
returns a Ranges
of the same
length as Hits object x
holding the regions of intersection
between the overlapping ranges in objects query
and subject
,
which should be the same query and subject used in the call to
findOverlaps
that generated x
.
The simplest approach is to call the findOverlaps
function
on a Ranges or other object with range information (aka
"range-based object").
query <- IRanges(c(1, 4, 9), c(5, 7, 10))
subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
## ---------------------------------------------------------------------
## findOverlaps()
## ---------------------------------------------------------------------
## at most one hit per query
findOverlaps(query, subject, select="first")
findOverlaps(query, subject, select="last")
findOverlaps(query, subject, select="arbitrary")
## overlap even if adjacent only
## (FIXME: the gap between 2 adjacent ranges should be still considered
## 0. So either we have an argument naming problem, or we should modify
## the handling of the 'maxgap' argument so that the user would need to
## specify maxgap=0L to obtain the result below.)
findOverlaps(query, subject, maxgap=1L)
## shortcut
findOverlaps(query, subject)
query <- IRanges(c(1, 4, 9), c(5, 7, 10))
subject <- IRanges(c(2, 2), c(5, 4))
## one Ranges with itself
findOverlaps(query)
## single points as query
subject <- IRanges(c(1, 6, 13), c(4, 9, 14))
findOverlaps(c(3L, 7L, 10L), subject, select="first")
## alternative overlap types
query <- IRanges(c(1, 5, 3, 4), width=c(2, 2, 4, 6))
subject <- IRanges(c(1, 3, 5, 6), width=c(4, 4, 5, 4))
findOverlaps(query, subject, type="start")
findOverlaps(query, subject, type="start", maxgap=1L)
findOverlaps(query, subject, type="end", select="first")
ov <- findOverlaps(query, subject, type="within", maxgap=1L)
ov
## ---------------------------------------------------------------------
## overlapsAny()
## ---------------------------------------------------------------------
overlapsAny(query, subject, type="start")
overlapsAny(query, subject, type="end")
query %over% subject # same as overlapsAny(query, subject)
query %within% subject # same as overlapsAny(query, subject,
# type="within")
## ---------------------------------------------------------------------
## "ranges" METHOD FOR Hits OBJECTS
## ---------------------------------------------------------------------
## extract the regions of intersection between the overlapping ranges
ranges(ov, query, subject)
## ---------------------------------------------------------------------
## Using RangesList objects
## ---------------------------------------------------------------------
query <- IRanges(c(1, 4, 9), c(5, 7, 10))
qpartition <- factor(c("a","a","b"))
qlist <- split(query, qpartition)
subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
spartition <- factor(c("a","a","b"))
slist <- split(subject, spartition)
## at most one hit per query
findOverlaps(qlist, slist, select="first")
findOverlaps(qlist, slist, select="last")
findOverlaps(qlist, slist, select="arbitrary")
query <- IRanges(c(1, 5, 3, 4), width=c(2, 2, 4, 6))
qpartition <- factor(c("a","a","b","b"))
qlist <- split(query, qpartition)
subject <- IRanges(c(1, 3, 5, 6), width=c(4, 4, 5, 4))
spartition <- factor(c("a","a","b","b"))
slist <- split(subject, spartition)
overlapsAny(qlist, slist, type="start")
overlapsAny(qlist, slist, type="end")
qlist
subsetByOverlaps(qlist, slist)
countOverlaps(qlist, slist)
Run the code above in your browser using DataLab