GenomicTuples (version 1.6.2)

nearest-methods: Finding the nearest genomic tuple/range neighbor

Description

The nearest, precede, follow, distance and distanceToNearest methods for GTuples objects and subclasses. NOTE: These methods treat the tuples as if they were ranges, with ranges given by $[pos_{1}, pos_{m}]$ and where $m$ is the size,GTuples-method of the tuples. This is done via inheritance so that a GTuples object is treated as a GRanges and the appropriate method is dispatched upon.

Usage

"precede"(x, subject, select = c("arbitrary", "all"), ignore.strand = FALSE, ...) "precede"(x, subject, select = c("arbitrary", "all"), ignore.strand = FALSE, ...)
"follow"(x, subject, select = c("arbitrary", "all"), ignore.strand=FALSE, ...) "follow"(x, subject, select = c("arbitrary", "all"), ignore.strand = FALSE, ...)
"nearest"(x, subject, select = c("arbitrary", "all"), ignore.strand = FALSE, ...) "nearest"(x, subject, select = c("arbitrary", "all"), ignore.strand = FALSE, ...)
"distanceToNearest"(x, subject, ignore.strand = FALSE, ...) "distanceToNearest"(x, subject, ignore.strand = FALSE, ...)
"distance"(x, y, ignore.strand = FALSE, ...)

Arguments

x
The query GTuples instance.
subject
The subject GTuples instance within which the nearest neighbors are found. Can be missing, in which case x is also the subject.
y
For the distance method, a GTuples or GRanges instance. Cannot be missing. If x and y are not the same length, the shortest will be recycled to match the length of the longest.
select
Logic for handling ties. By default, all methods select a single tuple/range (arbitrary for nearest, the first by order in subject for precede, and the last for follow).

When select = "all" a Hits object is returned with all matches for x. If x does not have a match in subject the x is not included in the Hits object.

ignore.strand
A logical indicating if the strand of the input tuples/ranges should be ignored. When TRUE, strand is set to '+'.
...
Additional arguments for methods.

Value

For nearest, precede and follow, an integer vector of indices in subject, or a Hits if select = "all".For distanceToNearest, a Hits object with a column for the query index (from), subject index (to) and the distance between the pair.For distance, an integer vector of distances between the tuples/ranges in x and y.

Details

  • nearest: Performs conventional nearest neighbor finding. Returns an integer vector containing the index of the nearest neighbor tuple/range in subject for each range in x. If there is no nearest neighbor NA is returned. For details of the algorithm see the man page in IRanges, ?nearest.
  • precede: For each range in x, precede returns the index of the tuple/range in subject that is directly preceded by the tuple/range in x. Overlapping tuples/ranges are excluded. NA is returned when there are no qualifying tuples/ranges in subject.
  • follow: The opposite of precede, follow returns the index of the tuple/range in subject that is directly followed by the tuple/range in x. Overlapping tuples/ranges are excluded. NA is returned when there are no qualifying tuples/ranges in subject.
  • Orientation and Strand: The relevant orientation for precede and follow is 5' to 3', consistent with the direction of translation. Because positional numbering along a chromosome is from left to right and transcription takes place from 5' to 3', precede and follow can appear to have `opposite' behavior on the + and - strand. Using positions 5 and 6 as an example, 5 precedes 6 on the + strand but follows 6 on the - strand.

    A tuple/range with strand * can be compared to tuples/ranges on either the + or - strand. Below we outline the priority when tuples/ranges on multiple strands are compared. When ignore.strand=TRUE all tuples/ranges are treated as if on the + strand.

    • x on + strand can match to tuples/ranges on both + and * strands. In the case of a tie the first tuple/range by order is chosen.
    • x on - strand can match to tuples/ranges on both - and * strands. In the case of a tie the first tuple/range by order is chosen.
    • x on * strand can match to tuples/ranges on any of +, - or * strands. In the case of a tie the first tuple/range by order is chosen.

  • distanceToNearest: Returns the distance for each tuple/range in x to its nearest neighbor in the subject.
  • distance: Returns the distance for each tuple/range in x to the range in y. The behavior of distance has changed in Bioconductor 2.12. See the man page ?distance in IRanges for details.

See Also

Examples

Run this code
  ## -----------------------------------------------------------
  ## precede() and follow()
  ## -----------------------------------------------------------
  query <- GTuples("A", matrix(c(5L, 20L, 6L, 21L), ncol = 2), strand = "+")
  subject <- GTuples("A", matrix(c(rep(c(10L, 15L), 2), rep(c(11L, 16L), 2)), 
                                 ncol = 2),
                          strand = c("+", "+", "-", "-"))
  precede(query, subject)
  follow(query, subject)
 
  strand(query) <- "-"
  precede(query, subject)
  follow(query, subject)
 
  ## ties choose first in order
  query <- GTuples("A", matrix(c(10L, 11L), ncol = 2), c("+", "-", "*"))
  subject <- GTuples("A", matrix(c(rep(c(5L, 15L), each = 3), 
                                   rep(c(6L, 16L), each = 3)), ncol = 2),
                          rep(c("+", "-", "*"), 2))
  precede(query, subject)
  precede(query, rev(subject))
 
  ## ignore.strand = TRUE treats all ranges as '+'
  precede(query[1], subject[4:6], select="all", ignore.strand = FALSE)
  precede(query[1], subject[4:6], select="all", ignore.strand = TRUE)
  
  ## -----------------------------------------------------------
  ## nearest()
  ## -----------------------------------------------------------
  ## When multiple tuples overlap an "arbitrary" tuple is chosen
  query <- GTuples("A", matrix(c(5L, 15L), ncol = 2))
  subject <- GTuples("A", matrix(c(1L, 15L, 5L, 19L), ncol = 2))
  nearest(query, subject)
 
  ## select = "all" returns all hits
  nearest(query, subject, select = "all")
 
  ## Tuples in 'x' will self-select when 'subject' is present
  query <- GTuples("A", matrix(c(1L, 10L, 6L, 15L), ncol = 2))
  nearest(query, query)
 
  ## Tuples in 'x' will not self-select when 'subject' is missing
  nearest(query)
  
  ## -----------------------------------------------------------
  ## distance(), distanceToNearest()
  ## -----------------------------------------------------------
  ## Adjacent, overlap, separated by 1
  query <- GTuples("A", matrix(c(1L, 2L, 10L, 5L, 8L, 11L), ncol = 2))
  subject <- GTuples("A", matrix(c(6L, 5L, 13L, 10L, 10L, 15L), ncol = 2))
  distance(query, subject)

  ## recycling
  distance(query[1], subject)

  query <- GTuples(c("A", "B"), matrix(c(1L, 5L, 2L, 6L), ncol = 2))
  distanceToNearest(query, subject)

Run the code above in your browser using DataLab