IntervalTree from the
      ranges in ranges, an object coercible to
      IntervalTree, such as an IRanges object.
    as(from, "IRanges"): Imports the ranges in
      from, an IntervalTree, to an
      IRanges.as(from, "IntervalTree"): Constructs an
      IntervalTree representing from, a Ranges
      object that is coercible to IRanges.
    length(x): Gets the number of ranges stored in the
      tree. This is a fast operation that does not bring the ranges into
      R.start(x): Get the starts of the ranges.end(x): Get the ends of the ranges.O(n*lg(n)), which makes it about as fast as other types of
  overlap query algorithms based on sorting. The good news is that the
  tree need only be built once per subject; this is useful in situations
  of frequent querying. Also, in this implementation the data is stored
  outside of R, avoiding needless copying. Of course, external storage
  is not always convenient, so it is possible to coerce the tree to an
  instance of IRanges (see the Coercion section). For the query operation, the running time is based on the query size
  m and the average number of hits per query k. The output
  size is then max(mk,m), but we abbreviate this as
  mk. Note that when the multiple parameter is set to
  FALSE, k is fixed to 1 and drops out of this
  analysis. We also assume here that the query is sorted by start
  position (the findOverlaps function sorts the query if it is unsorted). An upper bound for finding overlaps is
  O(min(mk*lg(n),n+mk)). The fastest interval tree algorithm
  known is bounded by O(min(m*lg(n),n)+mk) but is a lot more
  complicated and involves two auxillary trees. The lower bound is
  Omega(lg(n)+mk), which is almost the same as for returning
  the answer, Omega(mk). The average is of course somewhere in
  between. This analysis informs the choice of which set of ranges to process
  into a tree, i.e. assigning one to be the subject and the other to be
  the query. Note that if m > n, then the running time is
  O(m), and the total operation of complexity O(n*lg(n) +
  m) is better than if m and n were exchanged. Thus, for
  once-off operations, it is often most efficient to choose the smaller
  set to become the tree (but k also affects this). This is
  reinforced by the realization that if mk is about the same in
  either direction, the running time depends only on n, which
  should be minimized. Even in cases where a tree has already been
  constructed for one of the sets, it can be more efficient to build a
  new tree when the existing tree of size n is much larger than
  the query set of size m, roughly when n > m*lg(n).  The simplest approach for finding overlaps is to call the
  findOverlaps function on a Ranges or other object
  with range information. See the man page of findOverlaps
  for how to use this and other related functions.
  An IntervalTree object is a derivative of Ranges and
  stores its ranges as a tree that is optimized for overlap queries.
  Thus, for repeated queries against the same subject, it is more
  efficient to create an IntervalTree once for the subject using
  the constructor described below and then perform the queries against
  the IntervalTree instance.
findOverlaps for finding/counting interval overlaps between
  two "range-based" objects,
  Ranges, the parent of this class,
  Hits, set of hits between 2 vector-like objects.
  query <- IRanges(c(1, 4, 9), c(5, 7, 10))
  subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
  tree <- IntervalTree(subject)
  findOverlaps(query, tree)
  ## query and subject are easily interchangeable
  query <- IRanges(c(1, 4, 9), c(5, 7, 10))
  subject <- IRanges(c(2, 2), c(5, 4))
  tree <- IntervalTree(subject)
  t(findOverlaps(query, tree))
  # the same as:
  findOverlaps(subject, query)
Run the code above in your browser using DataLab