SummarizedExperiment-class: SummarizedExperiment instances

Description

The SummarizedExperiment class is a matrix-like container where rows represent ranges of interest (as a GRanges or GRangesList-class) and columns represent samples (with sample data summarized as a DataFrame-class). A SummarizedExperiment contains one or more assays, each represented by a matrix-like object of numeric or other mode.

Usage

## Constructors
SummarizedExperiment(assays, ...)
"SummarizedExperiment"(assays, rowData = GRangesList(), colData = DataFrame(), exptData = SimpleList(), ..., verbose = FALSE)
"SummarizedExperiment"(assays, ...)
"SummarizedExperiment"(assays, ...)
"SummarizedExperiment"(assays, ...)
## Accessors
assays(x, ..., withDimnames=TRUE)
assays(x, ..., withDimnames=TRUE) <- value
assay(x, i, ...)
assay(x, i, ...) <- value
rowData(x, ...)
rowData(x, ...) <- value
colData(x, ...)
colData(x, ...) <- value
exptData(x, ...)
exptData(x, ...) <- value
"dim"(x)
"dimnames"(x)
"dimnames"(x) <- value
"dimnames"(x) <- value
## colData access
"$"(x, name)
"$"(x, name) <- value
"[["(x, i, j, ...)
"[["(x, i, j, ...) <- value
## rowData access
## see 'GRanges compatibility', below
## Subsetting
"["(x, i, j, ..., drop=TRUE)
"["(x, i, j) <- value
"subset"(x, subset, select, ...)
## Combining 
"cbind"(..., deparse.level=1)
"rbind"(..., deparse.level=1)
## Coercion
"updateObject"(object, ..., verbose=FALSE)

Arguments

assays

A list or SimpleList of matrix elements, or a matrix. All elements of the list must have the same dimensions, and dimension names (if present) must be consistent across elements and with the row names of rowData and colData.

rowData

A GRanges or GRangesList instance describing the ranges of interest. Row names, if present, become the row names of the SummarizedExperiment. The length of the GRanges or the GRangesList must equal the number of rows of the matrices in assays.

colData

An optional DataFrame describing the samples. Row names, if present, become the column names of the SummarizedExperiment.

exptData

An optional SimpleList of arbitrary content describing the overall experiment.

...

For SummarizedExperiment, S4 methods list and matrix, arguments identical to those of the SimpleList method.

For assay, ... may contain withDimnames, which is forwarded to assays.

For cbind, rbind, ... contains SummarizedExperiment objects to be combined.

For other accessors, ignored.

verbose

A logical(1) indicating whether messages about data coercion during construction should be printed.

x, object

An instance of SummarizedExperiment-class.

i, j

For assay, assay<-, i is a integer or numeric scalar; see ‘Details’ for additional constraints.

For [,SummarizedExperiment, [,SummarizedExperiment<-, i, j are instances that can act to subset the underlying rowData, colData, and matrix elements of assays.

For [[,SummarizedExperiment, [[<-,SummarizedExperiment, i is a scalar index (e.g., character(1) or integer(1)) into a column of colData.

subset

An expression which, when evaluated in the context of rowData(x), is a logical vector indicating elements or rows to keep: missing values are taken as false.

select

An expression which, when evaluated in the context of colData(x), is a logical vector indicating elements or rows to keep: missing values are taken as false.

name

A symbol representing the name of a column of colData.

withDimnames

A logical(1), indicating whether dimnames should be applied to extracted assay elements (this argument is ignored for the setter assays<-).

drop

A logical(1), ignored by these methods.

value

An instance of a class specified in the S4 method signature or as outlined in ‘Details’.

deparse.level

See ?base::cbind for a description of this argument.

Constructor

Instances are constructed using the SummarizedExperiment function with arguments outlined above.

Coercion

Package version 1.9.59 introduced a new way of representing ‘assays’. If you have a serialized instance x of a SummarizedExperiment (e.g., from using the save function with a version of GenomicRanges prior to 1.9.59), it should be updated by invoking x <- updateObject(x).

Accessors

In the following code snippets, x is a SummarizedExperiment instance.

assays(x), assays(x) <- value:: Get or set the assays. value is a list or SimpleList, each element of which is a matrix with the same dimensions as x.
assay(x, i), assay(x, i) <- value:: A convenient alternative (to assays(x)[[i]], assays(x)[[i]] <- value) to get or set the ith (default first) assay element. value must be a matrix of the same dimension as x, and with dimension names NULL or consistent with those of x.
rowData(x), rowData(x) <- value:: Get or set the row data. value is a GenomicRanges instance. Row names of value must be NULL or consistent with the existing row names of x.
colData(x), colData(x) <- value:: Get or set the column data. value is a DataFrame instance. Row names of value must be NULL or consistent with the existing column names of x.
exptData(x), exptData(x) <- value:: Get or set the experiment data. value is a list or SimpleList instance, with arbitrary content.
dim(x):: Get the dimensions (ranges x samples) of the SummarizedExperiment.
dimnames(x), dimnames(x) <- value:: Get or set the dimension names. value is usually a list of length 2, containing elements that are either NULL or vectors of appropriate length for the corresponding dimension. value can be NULL, which removes dimension names. This method implies that rownames, rownames<-, colnames, and colnames<- are all available.

GRanges compatibility (rowData access)

Many GRanges-class and GRangesList-class operations are supported on ‘SummarizedExperiment’ and derived instances, using rowData. Supported operations include: compare, countOverlaps, coverage, disjointBins, distance, distanceToNearest, duplicated, end, end<-, findOverlaps, flank, follow, granges, isDisjoint, match, mcols, mcols<-, narrow, nearest, order, overlapsAny, precede, ranges, ranges<-, rank, resize, restrict, seqinfo, seqinfo<-, seqnames, shift, sort, split, relistToClass, start, start<-, strand, strand<-, subsetByOverlaps, width, width<-. Not all GRanges-class operations are supported, because they do not make sense for ‘SummarizedExperiment’ objects (e.g., length, name, as.data.frame, c, splitAsList), involve non-trivial combination or splitting of rows (e.g., disjoin, gaps, reduce, unique), or have not yet been implemented (Ops, map, window, window<-).

Subsetting

In the code snippets below, x is a SummarizedExperiment instance.

x[i,j], x[i,j] <- value:: Create or replace a subset of x. i, j can be numeric, logical, character, or missing. value must be a SummarizedExperiment instance with dimensions, dimension names, and assay elements consistent with the subset x[i,j] being replaced.
subset(x, subset, select):: Create a subset of x using an expression subset referring to columns of rowData(x) (including ‘seqnames’, ‘start’, ‘end’, ‘width’, ‘strand’, and names(mcols(x))) and / or select referring to column names of colData(x).

Additional subsetting accessors provide convenient access to colData columns

x$name, x$name <- value: Access or replace column name in x.
x[[i, ...]], x[[i, ...]] <- value: Access or replace column i in x.

Combining

In the code snippets below, ... are SummarizedExperiment instances to be combined.

cbind(...), rbind(...):: cbind combines objects with identical ranges (rowData) but different samples (columns in assays). The colnames in colData must match or an error is thrown. Duplicate columns of mcols(rowData(SummarizedExperiment)) must contain the same data. rbind combines objects with different ranges (rowData) and the same subjects (columns in assays). Duplicate columns of colData must contain the same data. exptData from all objects are combined into a SimpleList with no name checking.

Implementation and Extension

This section contains advanced material meant for package developers. SummarizedExperiment is implemented as an S4 class, and can be extended in the usual way, using contains="SummarizedExperiment" in the new class definition. In addition, the representation of the assays slot of SummarizedExperiment is as a virtual class Assays. This allows derived classes (contains="Assays") to easily implement alternative requirements for the assays, e.g., backed by file-based storage like NetCDF or the ff package, while re-using the existing SummarizedExperiment class without modification. The requirements on Assays are list-like semantics (e.g., sapply, [[ subsetting, names) with elements having matrix- or array-like semantics (e.g., dim, dimnames). These requirements can be made more precise if developers express interest. The current assays slot is implemented as a reference class that has copy-on-change semantics. This means that modifying non-assay slots does not copy the (large) assay data, and at the same time the user is not surprised by reference-based semantics. Updates to non-assay slots are very fast; updating the assays slot itself can be 5x or more faster than with an S4 instance in the slot. In a little more detail, a small reference class hierarchy (not exported from the GenomicRanges name space) defines a reference class ShallowData with a single field data of type ANY, and a derived class ShallowSimpleListAssays that specializes the type of data as SimpleList, and contains=c("ShallowData", "Assays"). The assays slot contains an instance of ShallowSimpleListAssays. Invoking assays() on a SummarizedExperiment re-dispatches from the assays slot to retrieve the SimpleList from the field of the reference class. This was achieved by implementing a generic (not exported) value(x, name, ...), with a method implemented on SummarizedExperiment that retrieves a slot when name is a slot containing an S4 object in x, and a field when name is a slot containing a ShallowData instance in x. Copy-on-change semantics is maintained by implementing the clone method (clone methods are supposed to do a deep copy, update methods a shallow copy; the clone generic is introduced, and not exported, in the GenomicRanges package). The ‘getter’ and ‘setter’ code for methods implemented on SummarizedExperiment use value for slot access, and clone for replacement. This makes it easy to implement ShallowData instances for other slots if the need arises.

Details

The SummarizedExperiment class is meant for numeric and other data types derived from a sequencing experiment. The structure is rectangular like a matrix, but with additional annotations on the rows and columns, and with the possibility to manage several assays simultaneously.

The rows of a SummarizedExperiment instance represent ranges (in genomic coordinates) of interest. The ranges of interest are described by a GRanges-class or a GRangesList-class instance, accessible using the rowData function, described below. The GRanges and GRangesList classes contains sequence (e.g., chromosome) name, genomic coordinates, and strand information. Each range can be annotated with additional data; this data might be used to describe the range or to summarize results (e.g., statistics of differential abundance) relevant to the range. Rows may or may not have row names; they often will not.

Each column of a SummarizedExperiment instance represents a sample. Information about the samples are stored in a DataFrame-class, accessible using the function colData, described below. The DataFrame must have as many rows as there are columns in the SummarizedExperiment, with each row of the DataFrame providing information on the sample in the corresponding column of the SummarizedExperiment. Columns of the DataFrame represent different sample attributes, e.g., tissue of origin, etc. Columns of the DataFrame can themselves be annotated (via the mcols function). Column names typically provide a short identifier unique to each sample.

A SummarizedExperiment can also contain information about the overall experiment, for instance the lab in which it was conducted, the publications with which it is associated, etc. This information is stored as a SimpleList-class, accessible using the exptData function. The form of the data associated with the experiment is left to the discretion of the user.

The SummarizedExperiment is appropriate for matrix-like data. The data are accessed using the assays function, described below. This returns a SimpleList-class instance. Each element of the list must itself be a matrix (of any mode) and must have dimensions that are the same as the dimensions of the SummarizedExperiment in which they are stored. Row and column names of each matrix must either be NULL or match those of the SummarizedExperiment during construction. It is convenient for the elements of SimpleList of assays to be named.

The SummarizedExperiment class has the following slots; this detail of class structure is not relevant to the user.

exptData: A SimpleList-class instance containing information about the overall experiment.

rowData

A GRanges-class instance defining the ranges of interest and associated metadata.

colData

A DataFrame-class instance describing the samples and associated metadata.

assays

A SimpleList-class instance, each element of which is a matrix summarizing data associated with the corresponding range and sample.

Examples

Run this code

  nrows <- 200; ncols <- 6
  counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
  rowData <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),
                     IRanges(floor(runif(200, 1e5, 1e6)), width=100),
                     strand=sample(c("+", "-"), 200, TRUE))
  colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
                       row.names=LETTERS[1:6])
  sset <- SummarizedExperiment(assays=SimpleList(counts=counts),
                 rowData=rowData, colData=colData)
  sset
  assays(sset) <- endoapply(assays(sset), asinh)
  head(assay(sset))

  sset[, sset$Treatment == "ChIP"]

  ## cbind combines objects with the same ranges and different samples.
  se1 <- sset
  se2 <- se1[,1:3]
  colnames(se2) <- letters[1:ncol(se2)] 
  cmb1 <- cbind(se1, se2)

  ## rbind combines objects with the same samples and different ranges.
  se1 <- sset
  se2 <- se1[1:50,]
  rownames(se2) <- letters[1:nrow(se2)] 
  cmb2 <- rbind(se1, se2)

Run the code above in your browser using DataLab