Finding overlapping genomic ranges

Finds interval overlaps between a GRanges, GIntervalTree, or GRangesList object, and another object containing ranges.

NOTE: The findOverlaps generic function and methods for Ranges and RangesList objects are defined and documented in the IRanges package. The methods for GAlignments, GAlignmentPairs, and GAlignmentsList objects are defined and documented in the GenomicAlignments package.

methods, utilities
"findOverlaps"(query, subject, maxgap = 0L, minoverlap = 1L, type = c("any", "start", "end", "within", "equal"), select = c("all", "first", "last", "arbitrary"), ignore.strand = FALSE)
"countOverlaps"(query, subject, maxgap = 0L, minoverlap = 1L, type = c("any", "start", "end", "within", "equal"), ignore.strand = FALSE)
"overlapsAny"(query, subject, maxgap = 0L, minoverlap = 1L, type = c("any", "start", "end", "within", "equal"), ignore.strand = FALSE)
"subsetByOverlaps"(query, subject, maxgap = 0L, minoverlap = 1L, type = c("any", "start", "end", "within", "equal"), ignore.strand = FALSE)
query, subject
A GRanges or GRangesList object. RangesList and RangedData are also accepted for one of query or subject. When query is a GRanges or GRangesList then subject can be a GIntervalTree object.
maxgap, minoverlap, type, select
See findOverlaps in the IRanges package for a description of these arguments.
When set to TRUE, the strand information is ignored in the overlap calculations.

When the query and the subject are GRanges or GRangesList objects, findOverlaps uses the triplet (sequence name, range, strand) to determine which features (see paragraph below for the definition of feature) from the query overlap which features in the subject, where a strand value of "*" is treated as occurring on both the "+" and "-" strand. An overlap is recorded when a feature in the query and a feature in the subject have the same sequence name, have a compatible pairing of strands (e.g. "+"/"+", "-"/"-", "*"/"+", "*"/"-", etc.), and satisfy the interval overlap requirements. Strand is taken as "*" for RangedData and RangesList.

In the context of findOverlaps, a feature is a collection of ranges that are treated as a single entity. For GRanges objects, a feature is a single range; while for GRangesList objects, a feature is a list element containing a set of ranges. In the results, the features are referred to by number, which run from 1 to length(query)/length(subject).

When the query is a GRanges or GRangesList object then the subject can be a GIntervalTree object. For repeated queries against the same subject, it is more efficient to create a GIntervalTree once for the subject using the GIntervalTree constructor described below and then perform the queries against the GIntervalTree instance. Note that GIntervalTree objects are not supported for circular genomes.


For findOverlaps either a Hits object when select = "all" or an integer vector otherwise.For countOverlaps an integer vector containing the tabulated query overlap hits.For overlapsAny a logical vector of length equal to the number of ranges in query indicating those that overlap any of the ranges in subject.For subsetByOverlaps an object of the same class as query containing the subset that overlapped at least one entity in subject.For RangedData and RangesList, with the exception of subsetByOverlaps, the results align to the unlisted form of the object. This turns out to be fairly convenient for RangedData (not so much for RangesList, but something has to give).

See Also

  • findOverlaps-methods
  • findOverlaps
  • findOverlaps,GenomicRanges,GenomicRanges-method
  • findOverlaps,GenomicRanges,GIntervalTree-method
  • findOverlaps,GRangesList,GenomicRanges-method
  • findOverlaps,GenomicRanges,GRangesList-method
  • findOverlaps,GRangesList,GRangesList-method
  • findOverlaps,RangesList,GenomicRanges-method
  • findOverlaps,RangesList,GRangesList-method
  • findOverlaps,GenomicRanges,RangesList-method
  • findOverlaps,GRangesList,RangesList-method
  • findOverlaps,RangedData,GenomicRanges-method
  • findOverlaps,RangedData,GRangesList-method
  • findOverlaps,GenomicRanges,RangedData-method
  • findOverlaps,GRangesList,RangedData-method
  • countOverlaps
  • countOverlaps,GenomicRanges,Vector-method
  • countOverlaps,Vector,GenomicRanges-method
  • countOverlaps,GenomicRanges,GenomicRanges-method
  • countOverlaps,GRangesList,Vector-method
  • countOverlaps,Vector,GRangesList-method
  • countOverlaps,GRangesList,GRangesList-method
  • countOverlaps,GRanges,GRangesList-method
  • countOverlaps,GRangesList,GRanges-method
  • overlapsAny
  • overlapsAny,GenomicRanges,GenomicRanges-method
  • overlapsAny,GRangesList,GenomicRanges-method
  • overlapsAny,GenomicRanges,GRangesList-method
  • overlapsAny,GRangesList,GRangesList-method
  • overlapsAny,RangesList,GenomicRanges-method
  • overlapsAny,RangesList,GRangesList-method
  • overlapsAny,GenomicRanges,RangesList-method
  • overlapsAny,GRangesList,RangesList-method
  • overlapsAny,RangedData,GenomicRanges-method
  • overlapsAny,RangedData,GRangesList-method
  • overlapsAny,GenomicRanges,RangedData-method
  • overlapsAny,GRangesList,RangedData-method
  • subsetByOverlaps
  • subsetByOverlaps,GenomicRanges,GenomicRanges-method
  • subsetByOverlaps,GRangesList,GenomicRanges-method
  • subsetByOverlaps,GenomicRanges,GRangesList-method
  • subsetByOverlaps,GRangesList,GRangesList-method
  • subsetByOverlaps,RangesList,GenomicRanges-method
  • subsetByOverlaps,RangesList,GRangesList-method
  • subsetByOverlaps,GenomicRanges,RangesList-method
  • subsetByOverlaps,GRangesList,RangesList-method
  • subsetByOverlaps,RangedData,GenomicRanges-method
  • subsetByOverlaps,RangedData,GRangesList-method
  • subsetByOverlaps,GenomicRanges,RangedData-method
  • subsetByOverlaps,GRangesList,RangedData-method
## GRanges object:
gr <-
  GRanges(seqnames =
          Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
          ranges =
          IRanges(1:10, width = 10:1, names = head(letters,10)),
          strand =
          Rle(strand(c("-", "+", "*", "+", "-")),
              c(1, 2, 2, 3, 2)),
          score = 1:10,
          GC = seq(1, 0, length=10))

## GRangesList object:
gr1 <-
  GRanges(seqnames = "chr2", ranges = IRanges(4:3, 6),
          strand = "+", score = 5:4, GC = 0.45)
gr2 <-
  GRanges(seqnames = c("chr1", "chr1"),
          ranges = IRanges(c(7,13), width = 3),
          strand = c("+", "-"), score = 3:4, GC = c(0.3, 0.5))
gr3 <-
  GRanges(seqnames = c("chr1", "chr2"),
          ranges = IRanges(c(1, 4), c(3, 9)),
          strand = c("-", "-"), score = c(6L, 2L), GC = c(0.4, 0.1))
grl <- GRangesList("gr1" = gr1, "gr2" = gr2, "gr3" = gr3)

## Overlapping two GRanges objects:
table(!, gr1, select="arbitrary")))
countOverlaps(gr, gr1)
findOverlaps(gr, gr1)
subsetByOverlaps(gr, gr1)

countOverlaps(gr, gr1, type = "start")
findOverlaps(gr, gr1, type = "start")
subsetByOverlaps(gr, gr1, type = "start")

findOverlaps(gr, gr1, select = "first")
findOverlaps(gr, gr1, select = "last")

findOverlaps(gr1, gr)
findOverlaps(gr1, gr, type = "start")
findOverlaps(gr1, gr, type = "within")
findOverlaps(gr1, gr, type = "equal")

## using a GIntervalTree
gtree <- GIntervalTree(gr1)
table(!, gtree, select="arbitrary")))
countOverlaps(gr, gtree)
findOverlaps(gr, gtree)
subsetByOverlaps(gr, gtree)

## Overlapping a GRanges and a GRangesList object:
table(!, gr, select="first")))
countOverlaps(grl, gr)
findOverlaps(grl, gr)
subsetByOverlaps(grl, gr)
countOverlaps(grl, gr, type = "start")
findOverlaps(grl, gr, type = "start")
subsetByOverlaps(grl, gr, type = "start")
findOverlaps(grl, gr, select = "first")

## using a GIntervalTree
table(!, gtree, select="first")))
countOverlaps(grl, gtree)
findOverlaps(grl, gtree)
subsetByOverlaps(grl, gtree)
countOverlaps(grl, gtree, type = "start")
findOverlaps(grl, gtree, type = "start")
subsetByOverlaps(grl, gtree, type = "start")
findOverlaps(grl, gtree, select = "first")

## Overlapping two GRangesList objects:
countOverlaps(grl, rev(grl))
findOverlaps(grl, rev(grl))
subsetByOverlaps(grl, rev(grl))
Documentation reproduced from package GenomicRanges, version 1.18.4, License: Artistic-2.0

Community examples

Looks like there are no examples yet.