IRanges (version 2.0.1)

findOverlaps-methods: Finding overlapping ranges

Description

Various methods for finding/counting interval overlaps between two "range-based" objects: a query and a subject.

NOTE: This man page describes the methods that operate on a query and a subject that are both either a Ranges, Views, RangesList, ViewsList, or RangedData object. (In addition, if the query is a Ranges object, the subject can be an IntervalTree object; if the query is a RangesList object, the subject can be a IntervalForest object. And if the subject is a Ranges object, the query can be an integer vector.)

See ?`findOverlaps,GenomicRanges,GenomicRanges-method` in the GenomicRanges package for methods that operate on GRanges or GRangesList objects. See also the ?`GIntervalTree` class and the ?`findOverlaps,GenomicRanges,GIntervalTree-method` method for finding overlaps with persistent IntervalForest objects.

Usage

findOverlaps(query, subject, maxgap=0L, minoverlap=1L, type=c("any", "start", "end", "within", "equal"), select=c("all", "first", "last", "arbitrary"), ...)
countOverlaps(query, subject, maxgap=0L, minoverlap=1L, type=c("any", "start", "end", "within", "equal"), ...)
overlapsAny(query, subject, maxgap=0L, minoverlap=1L, type=c("any", "start", "end", "within", "equal"), ...) query %over% subject query %within% subject query %outside% subject
subsetByOverlaps(query, subject, maxgap=0L, minoverlap=1L, type=c("any", "start", "end", "within", "equal"), ...)
mergeByOverlaps(query, subject, ...)
"ranges"(x, query, subject)

Arguments

query, subject
Each of them can be a Ranges, Views, RangesList, ViewsList, or RangedData object. In addition, if query is a Ranges object, subject can be an IntervalTree object; if query is a RangesList object, then subject can be an IntervalForest object. And if subject is a Ranges object, query can be an integer vector to be converted to length-one ranges. If query is a RangesList or RangedData, subject must be a RangesList or RangedData.

If both lists have names, each element from the subject is paired with the element from the query with the matching name, if any. Otherwise, elements are paired by position. The overlap is then computed between the pairs as described below.

If query is unsorted, it is sorted first, so it is usually better to sort up-front, to avoid a sort with each findOverlaps call.

If subject is omitted, query is queried against itself. In this case, and only this case, the ignoreSelf and ignoreRedundant arguments are allowed. By default, the result will contain hits for each range against itself, and if there is a hit from A to B, there is also a hit for B to A. If ignoreSelf is TRUE, all self matches are dropped. If ignoreRedundant is TRUE, only one of A->B and B->A is returned.

maxgap, minoverlap
Intervals with a separation of maxgap or less and a minimum of minoverlap overlapping positions, allowing for maxgap, are considered to be overlapping. maxgap should be a scalar, non-negative, integer. minoverlap should be a scalar, positive integer.
type
By default, any overlap is accepted. By specifying the type parameter, one can select for specific types of overlap. The types correspond to operations in Allen's Interval Algebra (see references). If type is start or end, the intervals are required to have matching starts or ends, respectively. While this operation seems trivial, the naive implementation using outer would be much less efficient. Specifying equal as the type returns the intersection of the start and end matches. If type is within, the query interval must be wholly contained within the subject interval. Note that all matches must additionally satisfy the minoverlap constraint described above.

The maxgap parameter has special meaning with the special overlap types. For start, end, and equal, it specifies the maximum difference in the starts, ends or both, respectively. For within, it is the maximum amount by which the query may be wider than the subject.

select
When select is "all" (the default), the results are returned as a Hits object. When select is "first", "last", or "arbitrary" the results are returned as an integer vector of length query containing the first, last, or arbitrary overlapping interval in subject, with NA indicating intervals that did not overlap any intervals in subject. If select is "all", a Hits object is returned. For all other select the return value depends on the drop argument. When select != "all" && !drop, an IntegerList is returned, where each element of the result corresponds to a space in query. Whenselect != "all" && drop, an integer vector is returned containing indices that are offset to align with the unlisted query.
...
Further arguments to be passed to or from other methods:
  • drop: All methods accept the drop argument (FALSE by default). See select argument above for the details.
  • ignoreSelf, ignoreRedundant: When subject is omitted, the ignoreSelf and ignoreRedundant arguments (both FALSE by default) are allowed. See query and subject arguments above for the details.

x
Hits object returned by findOverlaps.

Value

findOverlaps returns either a Hits object when select="all" (the default), or an integer vector when select is not "all". For RangesList objects it returns a HitsList-class object when select="all", or an IntegerList when select is not "all". When subject is an IntervalForest object, it returns a CompressedHitsList or CompressedIntegerList respectively.countOverlaps returns the overlap hit count for each range in query using the specified findOverlaps parameters. For RangesList objects, it returns an IntegerList object. When subject is an IntervalForest it returns a CompressedIntegerList.overlapsAny finds the ranges in query that overlap any of the ranges in subject. For Ranges or Views objects, it returns a logical vector of length equal to the number of ranges in query. For RangesList, RangedData, or ViewsList objects, it returns a LogicalList object, where each element of the result corresponds to a space in query. When subject is an IntervalForest object, it returns a CompressedLogicalList object.%over% and %within% are convenience wrappers for the 2 most common use cases. Currently defined as `%over%` <- function(query, subject) overlapsAny(query, subject) and `%within%` <- function(query, subject) overlapsAny(query, subject, type="within"). %outside% is simply the inverse of %over%.subsetByOverlaps returns the subset of query that has an overlap hit with a range in subject using the specified findOverlaps parameters.mergeByOverlaps computes the overlap between query and subject according to the arguments in .... It then extracts the corresponding hits from each object and returns a DataFrame containing one column for the query and one for the subject, as well as any mcols that were present on either object. The query and subject columns are named by quoting and deparsing the corresponding argument.ranges(x, query, subject) returns a Ranges of the same length as Hits object x holding the regions of intersection between the overlapping ranges in objects query and subject, which should be the same query and subject used in the call to findOverlaps that generated x.

Details

A common type of query that arises when working with intervals is finding which intervals in one set overlap those in another.

The simplest approach is to call the findOverlaps function on a Ranges or other object with range information (aka "range-based object").

An IntervalTree object is a derivative of Ranges and stores its ranges as a tree that is optimized for overlap queries. Thus, for repeated queries against the same subject, it is more efficient to create an IntervalTree once for the subject using the IntervalTree constructor and then perform the queries against the IntervalTree instance. An IntervalForest object is a derivative of RangesList and stores its ranges as a set of trees optimizized for partitioned overlap queries. Again, for repeated queries against the same subject list, it is more efficient to create an IntervalForest once and then perform the queries against the IntervalForest instance.

References

Allen's Interval Algebra: James F. Allen: Maintaining knowledge about temporal intervals. In: Communications of the ACM. 26/11/1983. ACM Press. S. 832-843, ISSN 0001-0782

See Also

Examples

Run this code
query <- IRanges(c(1, 4, 9), c(5, 7, 10))
subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
tree <- IntervalTree(subject)

## ---------------------------------------------------------------------
## findOverlaps()
## ---------------------------------------------------------------------

## at most one hit per query
findOverlaps(query, tree, select = "first")
findOverlaps(query, tree, select = "last")
findOverlaps(query, tree, select = "arbitrary")

## overlap even if adjacent only
## (FIXME: the gap between 2 adjacent ranges should be still considered
## 0. So either we have an argument naming problem, or we should modify
## the handling of the 'maxgap' argument so that the user would need to
## specify maxgap = 0L to obtain the result below.)
findOverlaps(query, tree, maxgap = 1L)

## shortcut
findOverlaps(query, subject)

query <- IRanges(c(1, 4, 9), c(5, 7, 10))
subject <- IRanges(c(2, 2), c(5, 4))
tree <- IntervalTree(subject)

## one Ranges with itself
findOverlaps(query)

## single points as query
subject <- IRanges(c(1, 6, 13), c(4, 9, 14))
findOverlaps(c(3L, 7L, 10L), subject, select = "first")

## alternative overlap types
query <- IRanges(c(1, 5, 3, 4), width=c(2, 2, 4, 6))
subject <- IRanges(c(1, 3, 5, 6), width=c(4, 4, 5, 4))

findOverlaps(query, subject, type = "start")
findOverlaps(query, subject, type = "start", maxgap = 1L)
findOverlaps(query, subject, type = "end", select = "first")
ov <- findOverlaps(query, subject, type = "within", maxgap = 1L)
ov

## ---------------------------------------------------------------------
## overlapsAny()
## ---------------------------------------------------------------------

overlapsAny(query, subject, type="start")
overlapsAny(query, subject, type="end")
query %over% subject    # same as overlapsAny(query, subject)
query %within% subject  # same as overlapsAny(query, subject,
                        #                     type="within")

## ---------------------------------------------------------------------
## "ranges" METHOD FOR Hits OBJECTS
## ---------------------------------------------------------------------

## extract the regions of intersection between the overlapping ranges
ranges(ov, query, subject)

## ---------------------------------------------------------------------
## using IntervalForest objects
## ---------------------------------------------------------------------
query <- IRanges(c(1, 4, 9), c(5, 7, 10))
qpartition <- factor(c("a","a","b"))
qlist <- split(query, qpartition)

subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
spartition <- factor(c("a","a","b"))
slist <- split(subject, spartition)

forest <- IntervalForest(slist)

## at most one hit per query
findOverlaps(qlist, forest, select = "first")
findOverlaps(qlist, forest, select = "last")
findOverlaps(qlist, forest, select = "arbitrary")

query <- IRanges(c(1, 5, 3, 4), width=c(2, 2, 4, 6))
qpartition <- factor(c("a","a","b","b"))
qlist <- split(query, qpartition)

subject <- IRanges(c(1, 3, 5, 6), width=c(4, 4, 5, 4))
spartition <- factor(c("a","a","b","b"))
slist <- split(subject, spartition)
forest <- IntervalForest(slist)

overlapsAny(qlist, forest, type="start")
overlapsAny(qlist, forest, type="end")
qlist 

subsetByOverlaps(qlist, forest)
countOverlaps(qlist, forest)

Run the code above in your browser using DataLab