Learn R Programming

IRanges (version 2.4.1)

Grouping-class: Grouping objects

Description

We call grouping an arbitrary mapping from a collection of NO objects to a collection of NG groups, or, more formally, a bipartite graph between integer sets [1, NO] and [1, NG]. Objects mapped to a given group are said to belong to, or to be assigned to, or to be in that group. Additionally, the objects in each group are ordered. So for example the 2 following groupings are considered different:
  Grouping 1: NG = 3, NO = 5
              group   objects
                  1 : 4, 2
                  2 :
                  3 : 4

Grouping 2: NG = 3, NO = 5 group objects 1 : 2, 4 2 : 3 : 4

There are no restriction on the mapping e.g. any object can be mapped to 0, 1, or more groups, and can be mapped twice to the same group. Also some or all the groups can be empty.

The Grouping class is a virtual class that formalizes the most general kind of grouping. More specific groupings (e.g. many-to-one mappings) are formalized via specific Grouping subclasses.

This man page documents the core Grouping API, and 2 important Grouping subclasses: ManyToOneGrouping and Partitioning (the latter being a particular case of the former).

Arguments

The core Grouping API

Let's give a formal description of the core Grouping API: Groups G_i are indexed from 1 to NG (1 <= 1="" i="" <="NG)." objects="" o_j="" are="" indexed="" from="" to="" no="" (1="" given="" that="" empty="" groups="" allowed,="" ng="" can="" be="" greater="" than="" no.="" if="" x is a Grouping object:
length(x): Returns the number of groups (NG).
names(x): Returns the names of the groups.
nobj(x): Returns the number of objects (NO).
Going from groups to objects:
x[[i]]: Returns the indices of the objects (the j's) that belong to G_i. This provides the mapping from groups to objects.
grouplength(x, i=NULL): Returns the number of objects in G_i. Works in a vectorized fashion (unlike x[[i]]). grouplength(x) is equivalent to grouplength(x, seq_len(length(x))). If i is not NULL, grouplength(x, i) is equivalent to sapply(i, function(ii) length(x[[ii]])).
Note to developers: Given that length, names and [[ are expected to work on any Grouping object, those objects can be seen as List objects. More precisely, the Grouping class actually extends the IntegerList class. In particular, many other "list" operations like as.list, elementLengths, and unlist, etc... should work out-of-the-box on any Grouping object.

ManyToOneGrouping objects

The ManyToOneGrouping class is a virtual class for representing groupings where every object belongs to one group and only one. The grouping of an empty collection of objects in an arbitrary number of groups is a valid ManyToOneGrouping object. Note that, for a ManyToOneGrouping object, if NG is 0 then NO must also be 0. The ManyToOneGrouping API extends the core Grouping API by adding a couple more operations for going from groups to objects:
members(x, i): Equivalent to x[[i]] if i is a single integer. Otherwise, if i is an integer vector of arbitrary length, it's equivalent to sort(unlist(sapply(i, function(ii) x[[ii]]))).
vmembers(x, L): A version of members that works in a vectorized fashion with respect to the L argument (L must be a list of integer vectors). Returns lapply(L, function(i) members(x, i)).
And also by adding operations for going from objects to groups:
togroup(x, j=NULL): Returns the index i of the group that O_j belongs to. This provides the mapping from objects to groups (many-to-one mapping). Works in a vectorized fashion. togroup(x) is equivalent to togroup(x, seq_len(nobj(x))): both return the entire mapping in an integer vector of length NO. If j is not NULL, togroup(x, j) is equivalent to y <- togroup(x); y[j].
togrouplength(x, j=NULL): Returns the number of objects that belong to the same group as O_j (including O_j itself). Equivalent to grouplength(x, togroup(x, j)).
One important property of any ManyToOneGrouping object x is that unlist(as.list(x)) is always a permutation of seq_len(nobj(x)). This is a direct consequence of the fact that every object in the grouping belongs to one group and only one.

2 ManyToOneGrouping concrete subclasses: H2LGrouping and Dups

[DOCUMENT ME] Constructors:
H2LGrouping(high2low=integer()): [DOCUMENT ME]
Dups(high2low=integer()): [DOCUMENT ME]

Partitioning objects

The Partitioning class is a virtual subclass of ManyToOneGrouping for representing block-groupings i.e. groupings where each group contains objects that are neighbors in the original collection of objects. More formally, a grouping x is a block-grouping iff togroup(x) is sorted in increasing order (not necessarily strictly increasing). A Partitioning object can also be seen (and manipulated) as a Ranges object where all the ranges are adjacent starting at 1 (i.e. it covers the 1:NO interval with no overlap between the ranges). Note that a Partitioning object is both: a particular type of ManyToOneGrouping object and a particular type of Ranges object. Therefore all the methods that are defined for ManyToOneGrouping and Ranges objects can also be used on a Partitioning object. See ?Ranges for a description of the Ranges API. The Partitioning virtual class has 3 concrete subclasses: PartitioningByEnd (only stores the end of the groups, allowing fast mapping from groups to objects), and PartitioningByWidth (only stores the width of the groups), and PartitioningMap which contains PartitioningByEnd and two additional slots to re-order and re-list the object to a related mapping. Constructors:
PartitioningByEnd(x=integer(), NG=NULL, names=NULL): x must be either a list-like object or a sorted integer vector. NG must be either NULL or a single integer. names must be either NULL or a character vector of length NG (if supplied) or length(x) (if NG is not supplied). Returns the following PartitioningByEnd object y:
  • If x is a list-like object, then the returned object y has the same length as x and is such that width(y) is identical to elementLengths(x).
  • If x is an integer vector and NG is not supplied, then x must be sorted (checked) and contain non-NA non-negative values (NOT checked). The returned object y has the same length as x and is such that end(y) is identical to x.
  • If x is an integer vector and NG is supplied, then x must be sorted (checked) and contain values >= 1 and <= NG (checked). The returned object y is of length NG and is such that togroup(y) is identical to x.
If the names argument is supplied, it is used to name the partitions.
PartitioningByWidth(x=integer(), NG=NULL, names=NULL): x must be either a list-like object or an integer vector. NG must be either NULL or a single integer. names must be either NULL or a character vector of length NG (if supplied) or length(x) (if NG is not supplied). Returns the following PartitioningByWidth object y:
  • If x is a list-like object, then the returned object y has the same length as x and is such that width(y) is identical to elementLengths(x).
  • If x is an integer vector and NG is not supplied, then x must contain non-NA non-negative values (NOT checked). The returned object y has the same length as x and is such that width(y) is identical to x.
  • If x is an integer vector and NG is supplied, then x must be sorted (checked) and contain values >= 1 and <= NG (checked). The returned object y is of length NG and is such that togroup(y) is identical to x.
If the names argument is supplied, it is used to name the partitions.
PartitioningMap(x=integer(), mapOrder=integer()): x is a list-like object or a sorted integer vector used to construct a PartitioningByEnd object. mapOrder numeric vector of the mapped order. Returns a PartitioningMap object.
Note that these constructors don't recycle their names argument (to remain consistent with what `names<-` does on standard vectors).

See Also

IntegerList-class, Ranges-class, IRanges-class, successiveIRanges, cumsum, diff

Examples

Run this code
showClass("Grouping")  # shows (some of) the known subclasses

## ---------------------------------------------------------------------
## A. H2LGrouping OBJECTS
## ---------------------------------------------------------------------
high2low <- c(NA, NA, 2, 2, NA, NA, NA, 6, NA, 1, 2, NA, 6, NA, NA, 2)
h2l <- H2LGrouping(high2low)
h2l

## The core Grouping API:
length(h2l)
nobj(h2l)  # same as 'length(h2l)' for H2LGrouping objects
h2l[[1]]
h2l[[2]]
h2l[[3]]
h2l[[4]]
h2l[[5]]
grouplength(h2l)  # same as 'unname(sapply(h2l, length))'
grouplength(h2l, 5:2)
members(h2l, 5:2)  # all the members are put together and sorted
togroup(h2l)
togroup(h2l, 5:2)
togrouplength(h2l)  # same as 'grouplength(h2l, togroup(h2l))'
togrouplength(h2l, 5:2)

## The List API:
as.list(h2l)
sapply(h2l, length)

## ---------------------------------------------------------------------
## B. Dups OBJECTS
## ---------------------------------------------------------------------
dups1 <- as(h2l, "Dups")
dups1
duplicated(dups1)  # same as 'duplicated(togroup(dups1))'

### The purpose of a Dups object is to describe the groups of duplicated
### elements in a vector-like object:
x <- c(2, 77, 4, 4, 7, 2, 8, 8, 4, 99)
x_high2low <- high2low(x)
x_high2low  # same length as 'x'
dups2 <- Dups(x_high2low)
dups2
togroup(dups2)
duplicated(dups2)
togrouplength(dups2)  # frequency for each element
table(x)

## ---------------------------------------------------------------------
## C. Partitioning OBJECTS
## ---------------------------------------------------------------------
pbe1 <- PartitioningByEnd(c(4, 7, 7, 8, 15), names=LETTERS[1:5])
pbe1  # the 3rd partition is empty

## The core Grouping API:
length(pbe1)
nobj(pbe1)
pbe1[[1]]
pbe1[[2]]
pbe1[[3]]
grouplength(pbe1)  # same as 'unname(sapply(pbe1, length))' and 'width(pbe1)'
togroup(pbe1)
togrouplength(pbe1)  # same as 'grouplength(pbe1, togroup(pbe1))'
names(pbe1)

## The Ranges core API:
start(pbe1)
end(pbe1)
width(pbe1)

## The List API:
as.list(pbe1)
sapply(pbe1, length)

## Replacing the names:
names(pbe1)[3] <- "empty partition"
pbe1

## Coercion to an IRanges object:
as(pbe1, "IRanges")

## Other examples:
PartitioningByEnd(c(0, 0, 19), names=LETTERS[1:3])
PartitioningByEnd()  # no partition
PartitioningByEnd(integer(9))  # all partitions are empty
x <- c(1L, 5L, 5L, 6L, 8L)
pbe2 <- PartitioningByEnd(x, NG=10L)
stopifnot(identical(togroup(pbe2), x))
pbw2 <- PartitioningByWidth(x, NG=10L)
stopifnot(identical(togroup(pbw2), x))

## ---------------------------------------------------------------------
## D. RELATIONSHIP BETWEEN Partitioning OBJECTS AND successiveIRanges()
## ---------------------------------------------------------------------
mywidths <- c(4, 3, 0, 1, 7)

## The 3 following calls produce the same ranges:
ir <- successiveIRanges(mywidths)  # IRanges instance.
pbe <- PartitioningByEnd(cumsum(mywidths))  # PartitioningByEnd instance.
pbw <- PartitioningByWidth(mywidths)  # PartitioningByWidth instance.
stopifnot(identical(as(ir, "PartitioningByEnd"), pbe))
stopifnot(identical(as(ir, "PartitioningByWidth"), pbw))

Run the code above in your browser using DataLab