SummarizedExperiment-class: SummarizedExperiment instances

Description

WARNING: The SummarizedExperiment class described here is deprecated and being replaced with the RangedSummarizedExperiment class defined in the new SummarizedExperiment package. Please make sure to install the SummarizedExperiment package before you attempt to use the SummarizedExperiment() constructor function. Note that this will return a RangedSummarizedExperiment instance instead of a SummarizedExperiment instance.

The SummarizedExperiment class is a matrix-like container where rows represent ranges of interest (as a GRanges or GRangesList-class) and columns represent samples (with sample data summarized as a DataFrame-class). A SummarizedExperiment contains one or more assays, each represented by a matrix-like object of numeric or other mode.

Usage

## Constructors
SummarizedExperiment(assays, ...)
## Accessors
assayNames(x, ...)
assayNames(x, ...) <- value
assays(x, ..., withDimnames=TRUE)
assays(x, ..., withDimnames=TRUE) <- value
assay(x, i, ...)
assay(x, i, ...) <- value
rowRanges(x, ...)
rowRanges(x, ...) <- value
colData(x, ...)
colData(x, ...) <- value
exptData(x, ...)
exptData(x, ...) <- value
"dim"(x)
"dimnames"(x)
"dimnames"(x) <- value
"dimnames"(x) <- value
## colData access
"$"(x, name)
"$"(x, name) <- value
"[["(x, i, j, ...)
"[["(x, i, j, ...) <- value
## rowRanges access
## see 'GRanges compatibility', below
## Subsetting
"["(x, i, j, ..., drop=TRUE)
"["(x, i, j) <- value
"subset"(x, subset, select, ...)
## Combining 
"cbind"(..., deparse.level=1)
"rbind"(..., deparse.level=1)
## Coercion
"updateObject"(object, ..., verbose=FALSE)
"coerce"(from, to = "SummarizedExperiment", strict = TRUE)
"coerce"(from, to = "ExpressionSet", strict = TRUE)

Arguments

assays

See ?RangedSummarizedExperiment in the SummarizedExperiment package.

...

For SummarizedExperiment, see ?RangedSummarizedExperiment in the SummarizedExperiment package.

For assay, ... may contain withDimnames, which is forwarded to assays.

For cbind, rbind, ... contains SummarizedExperiment objects to be combined.

For other accessors, ignored.

verbose

A logical(1) indicating whether messages about data coercion during construction should be printed.

x, object

An instance of SummarizedExperiment-class.

i, j

For assay, assay<-, i is a integer or numeric scalar; see ‘Details’ for additional constraints.

For [,SummarizedExperiment, [,SummarizedExperiment<-, i, j are instances that can act to subset the underlying rowRanges, colData, and matrix elements of assays.

For [[,SummarizedExperiment, [[<-,SummarizedExperiment, i is a scalar index (e.g., character(1) or integer(1)) into a column of colData.

subset

An expression which, when evaluated in the context of rowRanges(x), is a logical vector indicating elements or rows to keep: missing values are taken as false.

select

An expression which, when evaluated in the context of colData(x), is a logical vector indicating elements or rows to keep: missing values are taken as false.

name

A symbol representing the name of a column of colData.

withDimnames

A logical(1), indicating whether dimnames should be applied to extracted assay elements. Setting withDimnames=FALSE increases the speed and memory efficiency with which assays are extracted. withDimnames=TRUE in the getter assays<- allows efficient complex assignments (e.g., updating names of assays,

names(assays(x, withDimnames=FALSE))
    = ...

is more efficient than names(assays(x)) = ...); it does not influence actual assignment of dimnames to assays.

drop

A logical(1), ignored by these methods.

value

An instance of a class specified in the S4 method signature or as outlined in ‘Details’.

deparse.level

See ?base::cbind for a description of this argument.

from

the object to be coerced

the class to coerce to

strict

logical flag. If 'TRUE', the returned object must be strictly from the target class.

Constructor

Instances are constructed using the SummarizedExperiment function with arguments outlined above.

Coercion

Package version 1.9.59 introduced a new way of representing ‘assays’. If you have a serialized instance x of a SummarizedExperiment (e.g., from using the save function with a version of GenomicRanges prior to 1.9.59), it should be updated by invoking x <- updateObject(x).

as(from, "SummarizedExperiment"):: Creates a SummarizedExperiment object from a ExpressionSet object.

as(from, "ExpressionSet"):: Creates a ExpressionSet object from a SummarizedExperiment object.

The following data mappings are used for coercion between ExpressionSet and SummarizedExperiment.

assayData: assays
featureData: rowData
phenoData: colData
experimentData, annotation, protocolData: colData

If the SummarizedExperiment being coerced uses GRanges to store it's range data that data will be included in the featureData of the ExpressionSet. Because ExpressionSet objects require an assay named ‘exprs’ if the SummarizedExperiment object being coerced does not have an assay named ‘exprs’ the first assay will be renamed and a warning will be issued.

Accessors

In the following code snippets, x is a SummarizedExperiment instance.

assays(x), assays(x) <- value:: Get or set the assays. value is a list or SimpleList, each element of which is a matrix with the same dimensions as x.
assay(x, i), assay(x, i) <- value:: A convenient alternative (to assays(x)[[i]], assays(x)[[i]] <- value) to get or set the ith (default first) assay element. value must be a matrix of the same dimension as x, and with dimension names NULL or consistent with those of x.
assayNames(x), assayNames(x) <- value:: Get or set the names of assay() elements.
rowRanges(x), rowRanges(x) <- value:: Get or set the row data. value is a GenomicRanges instance. Row names of value must be NULL or consistent with the existing row names of x.
colData(x), colData(x) <- value:: Get or set the column data. value is a DataFrame instance. Row names of value must be NULL or consistent with the existing column names of x.
exptData(x), exptData(x) <- value:: Get or set the experiment data. value is a list or SimpleList instance, with arbitrary content.
dim(x):: Get the dimensions (ranges x samples) of the SummarizedExperiment.
dimnames(x), dimnames(x) <- value:: Get or set the dimension names. value is usually a list of length 2, containing elements that are either NULL or vectors of appropriate length for the corresponding dimension. value can be NULL, which removes dimension names. This method implies that rownames, rownames<-, colnames, and colnames<- are all available.

GRanges compatibility (rowRanges access)

Many GRanges-class and GRangesList-class operations are supported on ‘SummarizedExperiment’ and derived instances, using rowRanges. Supported operations include: compare, countOverlaps, coverage, disjointBins, distance, distanceToNearest, duplicated, end, end<-, findOverlaps, flank, follow, granges, isDisjoint, match, mcols, mcols<-, narrow, nearest, order, overlapsAny, precede, ranges, ranges<-, rank, resize, restrict, seqinfo, seqinfo<-, seqnames, shift, sort, split, relistToClass, start, start<-, strand, strand<-, subsetByOverlaps, width, width<-. Not all GRanges-class operations are supported, because they do not make sense for ‘SummarizedExperiment’ objects (e.g., length, name, as.data.frame, c, splitAsList), involve non-trivial combination or splitting of rows (e.g., disjoin, gaps, reduce, unique), or have not yet been implemented (Ops, map, window, window<-).

Subsetting

In the code snippets below, x is a SummarizedExperiment instance.

x[i,j], x[i,j] <- value:: Create or replace a subset of x. i, j can be numeric, logical, character, or missing. value must be a SummarizedExperiment instance with dimensions, dimension names, and assay elements consistent with the subset x[i,j] being replaced.
subset(x, subset, select):: Create a subset of x using an expression subset referring to columns of rowRanges(x) (including ‘seqnames’, ‘start’, ‘end’, ‘width’, ‘strand’, and names(mcols(x))) and / or select referring to column names of colData(x).

Additional subsetting accessors provide convenient access to colData columns

x$name, x$name <- value: Access or replace column name in x.
x[[i, ...]], x[[i, ...]] <- value: Access or replace column i in x.

Combining

In the code snippets below, ... are SummarizedExperiment instances to be combined.

cbind(...), rbind(...):: cbind combines objects with identical ranges (rowRanges) but different samples (columns in assays). The colnames in colData must match or an error is thrown. Duplicate columns of mcols(rowRanges(SummarizedExperiment)) must contain the same data. Data in assays are combined by name matching; if all names are NULL matching is by position. A mixture of names and NULL throws an error. rbind combines objects with different ranges (rowRanges) and the same subjects (columns in assays). Duplicate columns of colData must contain the same data. exptData from all objects are combined into a SimpleList with no name checking.

Implementation and Extension

This section contains advanced material meant for package developers. SummarizedExperiment is implemented as an S4 class, and can be extended in the usual way, using contains="SummarizedExperiment" in the new class definition. In addition, the representation of the assays slot of SummarizedExperiment is as a virtual class Assays. This allows derived classes (contains="Assays") to easily implement alternative requirements for the assays, e.g., backed by file-based storage like NetCDF or the ff package, while re-using the existing SummarizedExperiment class without modification. The requirements on Assays are list-like semantics (e.g., sapply, [[ subsetting, names) with elements having matrix- or array-like semantics (e.g., dim, dimnames). These requirements can be made more precise if developers express interest. The current assays slot is implemented as a reference class that has copy-on-change semantics. This means that modifying non-assay slots does not copy the (large) assay data, and at the same time the user is not surprised by reference-based semantics. Updates to non-assay slots are very fast; updating the assays slot itself can be 5x or more faster than with an S4 instance in the slot. One useful technique when working with assay or assays function is use of the withDimnames=FALSE argument, which benefits speed and memory use by not copying dimnames from the row- and colData elements to each assay. In a little more detail, a small reference class hierarchy (not exported from the GenomicRanges name space) defines a reference class ShallowData with a single field data of type ANY, and a derived class ShallowSimpleListAssays that specializes the type of data as SimpleList, and contains=c("ShallowData", "Assays"). The assays slot contains an instance of ShallowSimpleListAssays. Invoking assays() on a SummarizedExperiment re-dispatches from the assays slot to retrieve the SimpleList from the field of the reference class. This was achieved by implementing a generic (not exported) value(x, name, ...), with a method implemented on SummarizedExperiment that retrieves a slot when name is a slot containing an S4 object in x, and a field when name is a slot containing a ShallowData instance in x. Copy-on-change semantics is maintained by implementing the clone method (clone methods are supposed to do a deep copy, update methods a shallow copy; the clone generic is introduced, and not exported, in the GenomicRanges package). The ‘getter’ and ‘setter’ code for methods implemented on SummarizedExperiment use value for slot access, and clone for replacement. This makes it easy to implement ShallowData instances for other slots if the need arises.

Details

The SummarizedExperiment class is meant for numeric and other data types derived from a sequencing experiment. The structure is rectangular like a matrix, but with additional annotations on the rows and columns, and with the possibility to manage several assays simultaneously.

The rows of a SummarizedExperiment instance represent ranges (in genomic coordinates) of interest. The ranges of interest are described by a GRanges-class or a GRangesList-class instance, accessible using the rowRanges function, described below. The GRanges and GRangesList classes contains sequence (e.g., chromosome) name, genomic coordinates, and strand information. Each range can be annotated with additional data; this data might be used to describe the range or to summarize results (e.g., statistics of differential abundance) relevant to the range. Rows may or may not have row names; they often will not.

Each column of a SummarizedExperiment instance represents a sample. Information about the samples are stored in a DataFrame-class, accessible using the function colData, described below. The DataFrame must have as many rows as there are columns in the SummarizedExperiment, with each row of the DataFrame providing information on the sample in the corresponding column of the SummarizedExperiment. Columns of the DataFrame represent different sample attributes, e.g., tissue of origin, etc. Columns of the DataFrame can themselves be annotated (via the mcols function). Column names typically provide a short identifier unique to each sample.

A SummarizedExperiment can also contain information about the overall experiment, for instance the lab in which it was conducted, the publications with which it is associated, etc. This information is stored as a SimpleList-class, accessible using the exptData function. The form of the data associated with the experiment is left to the discretion of the user.

The SummarizedExperiment is appropriate for matrix-like data. The data are accessed using the assays function, described below. This returns a SimpleList-class instance. Each element of the list must itself be a matrix (of any mode) and must have dimensions that are the same as the dimensions of the SummarizedExperiment in which they are stored. Row and column names of each matrix must either be NULL or match those of the SummarizedExperiment during construction. It is convenient for the elements of SimpleList of assays to be named.

The SummarizedExperiment class has the following slots; this detail of class structure is not relevant to the user.

exptData: A SimpleList-class instance containing information about the overall experiment.

rowData

A GRanges-class instance defining the ranges of interest and associated metadata. WARNING: The accessor for this slot is rowRanges, not rowData!

colData

A DataFrame-class instance describing the samples and associated metadata.

assays

A SimpleList-class instance, each element of which is a matrix summarizing data associated with the corresponding range and sample.

Examples

Run this code

## WARNING: The SummarizedExperiment class is deprecated and being
## replaced with the RangedSummarizedExperiment class defined in the
## new SummarizedExperiment package. See ?RangedSummarizedExperiment
## in the SummarizedExperiment package for examples of how to create
## and manipulate RangedSummarizedExperiment objects.

Run the code above in your browser using DataLab