GenomicRanges-comparison: Comparing and ordering genomic ranges

Description

Methods for comparing and ordering the elements in one or more GenomicRanges objects.

Usage

## duplicated()
## ------------
"duplicated"(x, incomparables=FALSE, fromLast=FALSE, method=c("auto", "quick", "hash"))
## match() & selfmatch()
## ---------------------
"match"(x, table, nomatch=NA_integer_, incomparables=NULL, method=c("auto", "quick", "hash"), ignore.strand=FALSE)
"selfmatch"(x, method=c("auto", "quick", "hash"), ignore.strand=FALSE)
## order() and related methods
## ----------------------------
"is.unsorted"(x, na.rm=FALSE, strictly=FALSE, ignore.strand=FALSE)
"order"(..., na.last=TRUE, decreasing=FALSE, method=c("shell", "radix"))
"sort"(x, decreasing=FALSE, ignore.strand=FALSE, by)
"rank"(x, na.last=TRUE, ties.method=c("average", "first", "random", "max", "min"))
## Generalized parallel comparison of 2 GenomicRanges objects
## ----------------------------------------------------------
"pcompare"(x, y)

Arguments

x, table, y

GenomicRanges objects.

incomparables

Not supported.

fromLast, method, nomatch

See ?`Ranges-comparison` in the IRanges package for a description of these arguments.

ignore.strand

Whether or not the strand should be ignored when comparing 2 genomic ranges.

na.rm

Ignored.

strictly

Logical indicating if the check should be for strictly increasing values.

...

One or more GenomicRanges objects. The GenomicRanges objects after the first one are used to break ties.

na.last

Ignored.

decreasing

TRUE or FALSE.

ties.method

A character string specifying how ties are treated. Only "first" is supported for now.

An optional formula that is resolved against as.env(x); the resulting variables are passed to order to generate the ordering permutation.

Details

Two elements of a GenomicRanges object (i.e. two genomic ranges) are considered equal iff they are on the same underlying sequence and strand, and have the same start and width. duplicated() and unique() on a GenomicRanges object are conforming to this.

The "natural order" for the elements of a GenomicRanges object is to order them (a) first by sequence level, (b) then by strand, (c) then by start, (d) and finally by width. This way, the space of genomic ranges is totally ordered. Note that the reduce method for GenomicRanges uses this "natural order" implicitly. Also, note that, because we already do (c) and (d) for regular ranges (see ?`Ranges-comparison`), genomic ranges that belong to the same underlying sequence and strand are ordered like regular ranges.

is.unsorted(), order(), sort(), and rank() on a GenomicRanges object behave accordingly to this "natural order".

==, !=, <=< code="">, >=, < and > on GenomicRanges objects also behave accordingly to this "natural order".

Examples

Run this code

gr0 <- GRanges(
    Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
    IRanges(c(1:9,7L), end=10),
    strand=Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
    seqlengths=c(chr1=11, chr2=12, chr3=13)
)
gr <- c(gr0, gr0[7:3])
names(gr) <- LETTERS[seq_along(gr)]

## ---------------------------------------------------------------------
## A. ELEMENT-WISE (AKA "PARALLEL") COMPARISON OF 2 GenomicRanges OBJECTS
## ---------------------------------------------------------------------
gr[2] == gr[2]  # TRUE
gr[2] == gr[5]  # FALSE
gr == gr[4]
gr >= gr[3]

## ---------------------------------------------------------------------
## B. duplicated(), unique()
## ---------------------------------------------------------------------
duplicated(gr)
unique(gr)

## ---------------------------------------------------------------------
## C. match(), %in%
## ---------------------------------------------------------------------
table <- gr[1:7]
match(gr, table)
match(gr, table, ignore.strand=TRUE)

gr %in% table

## ---------------------------------------------------------------------
## D. findMatches(), countMatches()
## ---------------------------------------------------------------------
findMatches(gr, table)
countMatches(gr, table)

findMatches(gr, table, ignore.strand=TRUE)
countMatches(gr, table, ignore.strand=TRUE)

gr_levels <- unique(gr)
countMatches(gr_levels, gr)

## ---------------------------------------------------------------------
## E. order() AND RELATED METHODS
## ---------------------------------------------------------------------
is.unsorted(gr)
order(gr)
sort(gr)
is.unsorted(sort(gr))

is.unsorted(gr, ignore.strand=TRUE)
gr2 <- sort(gr, ignore.strand=TRUE)
is.unsorted(gr2)  # TRUE
is.unsorted(gr2, ignore.strand=TRUE)  # FALSE

## TODO: Broken. Please fix!
#sort(gr, by = ~ seqnames + start + end) # equivalent to (but slower than) above

score(gr) <- rev(seq_len(length(gr)))

## TODO: Broken. Please fix!
#sort(gr, by = ~ score)

rank(gr)

## ---------------------------------------------------------------------
## F. GENERALIZED ELEMENT-WISE COMPARISON OF 2 GenomicRanges OBJECTS
## ---------------------------------------------------------------------
gr3 <- GRanges(c(rep("chr1", 12), "chr2"), IRanges(c(1:11, 6:7), width=3))
strand(gr3)[12] <- "+"
gr4 <- GRanges("chr1", IRanges(5, 9))

pcompare(gr3, gr4)
rangeComparisonCodeToLetter(pcompare(gr3, gr4))

Run the code above in your browser using DataLab

Description

Usage

Arguments

Details

See Also

Examples