GenomicRanges (version 1.18.4)

GIntervalTree-class: GIntervalTree objects

Description

The GRanges class is a container for the genomic locations and their associated annotations.

Arguments

Constructor

GIntervalTree(x): Creates a GIntervalTree object from a GRanges object.
x
a GRanges object containing the genomic ranges.

Coercion

as(from, "GIntervalTree"): Creates a GIntervalTree object from a GRanges object. as(from, "GRanges"): Creates a GRanges object from an GIntervalTree object

Accessors

In the following code snippets, x is a GIntervalTree object.
length(x): Get the number of elements.
seqnames(x): Get the sequence names.
ranges(x): Get the ranges as an IRanges object. This is for consistency with the ranges accessor for GRanges objects. To access the underlying IntervalForest object use the obj@ranges form.
strand(x): Get the strand.
mcols(x, use.names=FALSE), mcols(x) <- value: Get or set the metadata columns. If use.names=TRUE and the metadata columns are not NULL, then the names of x are propagated as the row names of the returned DataFrame object. When setting the metadata columns, the supplied value must be NULL or a data.frame-like object (i.e. DataTable or data.frame) object holding element-wise metadata.
elementMetadata(x), elementMetadata(x) <- value, values(x), values(x) <- value: Alternatives to mcols functions. Their use is discouraged.
seqinfo(x): Get or set the information about the underlying sequences. value must be a Seqinfo object.
seqlevels(x): Get the sequence levels. These are stored in the partition slot of the underlying IntervalForest object.
seqlengths(x): Get the sequence lengths.
isCircular(x): Get the circularity flags. Note that GIntervalTree objects are not supported for circular genomes.
genome(x): Get or the genome identifier or assembly name for each sequence.
seqlevelsStyle(x): Get the seqname style for x. See the seqlevelsStyle generic getter in the GenomeInfoDb package for more information.
score(x): Get the “score” column from the element metadata, if any.

Ranges methods

In the following code snippets, x is a GIntervalTree object.
start(x): Get start(ranges(x)).
end(x): Get end(ranges(x)).
width(x): Get width(ranges(x)).

Subsetting

In the code snippets below, x is a GIntervalTree object.
x[i, j]: Get elements i with optional metadata columns mcols(x)[,j], where i can be missing; an NA-free logical, numeric, or character vector; or a 'logical' Rle object.

Other methods

show(x): By default the show method displays 5 head and 5 tail elements. This can be changed by setting the global options showHeadLines and showTailLines. If the object length is less than (or equal to) the sum of these 2 options plus 1, then the full object is displayed.

Details

A common type of query that arises when working with intervals is finding which intervals in one set overlap those in another. An efficient family of algorithms for answering such queries is known as the Interval Tree. The GIntervalTree class implements persistent Interval Trees for efficient querying of genomic intervals. It uses the IntervalForest class to store a set of trees, one for each seqlevel in a GRanges object.

The simplest approach for finding overlaps is to call the findOverlaps function on a Ranges or other object with range information. See the man page of findOverlaps for how to use this and other related functions. A GIntervalTree object is a derivative of GenomicRanges and stores its genomic ranges as a set of trees (a forest, with one tree per seqlevel) that is optimized for overlap queries. Thus, for repeated queries against the same subject, it is more efficient to create a GIntervalTree once for the subject using the constructor described below and then perform the queries against the GIntervalTree instance.

Like its GenomicRanges parent class, the GIntervalTree class stores the sequences of genomic locations and associated annotations. Each element in the sequence is comprised of a sequence name, an interval, a strand, and optional metadata columns (e.g. score, GC content, etc.). This information is stored in four components as in GenomicRanges, but two of these components are treated in a specific way:

ranges
an IntervalForest object containing the ranges stored as a set of interval trees.

seqnames
these are not stored directly in this class, but are obtained from the partitioning component of the IntervalForest object stored in ranges.

Note that GIntervalTree objects are not supported for GRanges objects with circular genomes.

See Also

seqinfo, IntervalForest, IntervalTree, findOverlaps-methods,

Examples

Run this code
seqinfo <- Seqinfo(paste0("chr", 1:3), c(1000, 2000, 1500), NA, "mock1")
gr <-
  GRanges(seqnames =
          Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
          ranges = IRanges(
            1:10, width = 10:1, names = head(letters,10)),
          strand = Rle(
            strand(c("-", "+", "*", "+", "-")),
            c(1, 2, 2, 3, 2)),
          score = 1:10,
          GC = seq(1, 0, length=10),
          seqinfo=seqinfo)
tree <- GIntervalTree(gr)
tree

## Summarizing elements
table(seqnames(tree))
sum(width(tree))
summary(mcols(tree)[,"score"])

## find Overlaps
subject <-
    GRanges(seqnames =
              Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
            ranges =
              IRanges(1:10, width = 10:1, names = head(letters,10)),
            strand =
              Rle(strand(c("-", "+", "*", "+", "-")),
                  c(1, 2, 2, 3, 2)),
            score = 1:10,
            GC = seq(1, 0, length=10))
  query <-
    GRanges(seqnames = "chr2", ranges = IRanges(4:3, 6),
            strand = "+", score = 5:4, GC = 0.45)
  
  stree <- GIntervalTree(subject)
  findOverlaps(query, stree)
  countOverlaps(query, stree)

Run the code above in your browser using DataLab