SummarizedExperiment-class {SummarizedExperiment} | R Documentation |
SummarizedExperiment objects
Description
The SummarizedExperiment class is a matrix-like container where rows represent features of interest (e.g. genes, transcripts, exons, etc...) and columns represent samples (with sample data summarized as a DataFrame). A SummarizedExperiment object contains one or more assays, each represented by a matrix-like object of numeric or other mode.
Note that SummarizedExperiment is the parent of the RangedSummarizedExperiment class which means that all the methods documented below also work on a RangedSummarizedExperiment object.
Usage
## Constructor
# See ?RangedSummarizedExperiment for the constructor function.
## Accessors
assayNames(x, ...)
assayNames(x, ...) <- value
assays(x, withDimnames=TRUE, ...)
assays(x, withDimnames=TRUE, ...) <- value
assay(x, i, withDimnames=TRUE, ...)
assay(x, i, withDimnames=TRUE, ...) <- value
rowData(x, use.names=TRUE, ...)
rowData(x, ...) <- value
colData(x, ...)
colData(x, ...) <- value
#dim(x)
#dimnames(x)
#dimnames(x) <- value
## Quick colData access
## S4 method for signature 'SummarizedExperiment'
x$name
## S4 replacement method for signature 'SummarizedExperiment'
x$name <- value
## S4 method for signature 'SummarizedExperiment,ANY,missing'
x[[i, j, ...]]
## S4 replacement method for signature 'SummarizedExperiment,ANY,missing'
x[[i, j, ...]] <- value
## Subsetting
## S4 method for signature 'SummarizedExperiment'
x[i, j, ..., drop=TRUE]
## S4 replacement method for signature 'SummarizedExperiment,ANY,ANY,SummarizedExperiment'
x[i, j] <- value
## S4 method for signature 'SummarizedExperiment'
subset(x, subset, select, ...)
## Combining
## S4 method for signature 'SummarizedExperiment'
rbind(..., deparse.level=1)
## S4 method for signature 'SummarizedExperiment'
cbind(..., deparse.level=1)
## S4 method for signature 'SummarizedExperiment'
combineRows(x, ..., delayed=TRUE, fill=NA, use.names=TRUE)
## S4 method for signature 'SummarizedExperiment'
combineCols(x, ..., delayed=TRUE, fill=NA, use.names=TRUE)
## On-disk realization
## S4 method for signature 'SummarizedExperiment'
realize(x, BACKEND=getAutoRealizationBackend())
Arguments
x |
A SummarizedExperiment object. |
... |
For For For other accessors, ignored. |
value |
An object of a class specified in the S4 method signature or as outlined in ‘Details’. |
i , j |
For For For |
name |
A symbol representing the name of a column of |
withDimnames |
A Setting Note that assays(x, withDimnames=FALSE) <- assays(x, withDimnames=FALSE) is guaranteed to always work and be a no-op. This is not the case
if |
use.names |
For For |
drop |
A |
deparse.level |
See |
subset |
An expression which, when evaluated in the context of |
select |
An expression which, when evaluated in the context of |
delayed , fill |
See |
BACKEND |
|
Details
The SummarizedExperiment class is meant for numeric and other
data types derived from a sequencing experiment. The structure is
rectangular like a matrix
, but with additional annotations on
the rows and columns, and with the possibility to manage several
assays simultaneously so long as they be of the same dimensions.
The rows of a SummarizedExperiment object represent features
of interest. Information about these features is stored in a
DataFrame object, accessible using the function
rowData
. The DataFrame must have as many rows
as there are rows in the SummarizedExperiment object, with each row
of the DataFrame providing information on the feature in the
corresponding row of the SummarizedExperiment object. Columns of the
DataFrame represent different attributes of the features
of interest, e.g., gene or transcript IDs, etc.
Each column of a SummarizedExperiment object represents a sample.
Information about the samples are stored in a DataFrame,
accessible using the function colData
, described below.
The DataFrame must have as many rows as there are
columns in the SummarizedExperiment object, with each row of the
DataFrame providing information on the sample in the
corresponding column of the SummarizedExperiment object.
Columns of the DataFrame represent different sample
attributes, e.g., tissue of origin, etc. Columns of the
DataFrame can themselves be annotated (via the
mcols
function). Column names typically
provide a short identifier unique to each sample.
A SummarizedExperiment object can also contain information about
the overall experiment, for instance the lab in which it was conducted,
the publications with which it is associated, etc. This information is
stored as a list
object, accessible using the metadata
function. The form of the data associated with the experiment is left to
the discretion of the user.
The SummarizedExperiment container is appropriate for matrix-like
data. The data are accessed using the assays
function,
described below. This returns a SimpleList object. Each
element of the list must itself be a matrix (of any mode) and must
have dimensions that are the same as the dimensions of the
SummarizedExperiment in which they are stored. Row and column
names of each matrix must either be NULL
or match those of the
SummarizedExperiment during construction. It is convenient for
the elements of SimpleList of assays to be named.
Constructor
SummarizedExperiment instances are constructed using the
SummarizedExperiment
function documented in
?RangedSummarizedExperiment
.
Accessors
In the following code snippets, x
is a SummarizedExperiment
object.
assays(x)
,assays(x) <- value
:Get or set the assays.
value
is alist
orSimpleList
, each element of which is a matrix with the same dimensions asx
.assay(x, i)
,assay(x, i) <- value
:A convenient alternative (to
assays(x)[[i]]
,assays(x)[[i]] <- value
) to get or set thei
th (default first) assay element.value
must be a matrix of the same dimension asx
, and with dimension namesNULL
or consistent with those ofx
.assayNames(x)
,assayNames(x) <- value
:Get or set the names of
assay()
elements.rowData(x, use.names=TRUE)
,rowData(x) <- value
:-
Get or set the row data.
value
is a DataFrame object. colData(x)
,colData(x) <- value
:Get or set the column data.
value
is a DataFrame object. Row names ofvalue
must be NULL or consistent with the existing column names ofx
.metadata(x)
,metadata(x) <- value
:Get or set the experiment data.
value
is alist
with arbitrary content.dim(x)
:Get the dimensions (features of interest x samples) of the SummarizedExperiment.
dimnames(x)
,dimnames(x) <- value
:Get or set the dimension names.
value
is usually a list of length 2, containing elements that are eitherNULL
or vectors of appropriate length for the corresponding dimension.value
can beNULL
, which removes dimension names. This method implies thatrownames
,rownames<-
,colnames
, andcolnames<-
are all available.
Subsetting
In the code snippets below, x
is a SummarizedExperiment object.
x[i,j]
,x[i,j] <- value
:Create or replace a subset of
x
.i
,j
can benumeric
,logical
,character
, ormissing
.value
must be a SummarizedExperiment object with dimensions, dimension names, and assay elements consistent with the subsetx[i,j]
being replaced.subset(x, subset, select)
:Create a subset of
x
using an expressionsubset
referring to columns ofrowData(x)
and / orselect
referring to column names ofcolData(x)
.
Additional subsetting accessors provide convenient access to
colData
columns
x$name
,x$name <- value
Access or replace column
name
inx
.x[[i, ...]]
,x[[i, ...]] <- value
Access or replace column
i
inx
.
Combining
In the code snippets below, x
, y
and ...
are
SummarizedExperiment objects to be combined.
rbind(...)
:-
rbind
combines objects with the same samples but different features of interest (rows inassays
). The colnames inrowData(SummarizedExperiment)
must match or an error is thrown. Duplicate columns ofcolData(SummarizedExperiment)
must contain the same data.Data in
assays
are combined by name matching; if all assay names are NULL matching is by position. A mixture of names and NULL throws an error.metadata
from all objects are combined into alist
with no name checking. cbind(...)
:-
cbind
combines objects with the same features of interest but different samples (columns inassays
). The colnames incolData(SummarizedExperiment)
must match or an error is thrown. Duplicate columns ofrowData(SummarizedExperiment)
must contain the same data.Data in
assays
are combined by name matching; if all assay names are NULL matching is by position. A mixture of names and NULL throws an error.metadata
from all objects are combined into alist
with no name checking. combineRows(x, ..., use.names=TRUE, delayed=TRUE, fill=NA)
:-
combineRows
acts like more flexiblerbind
, returning a SummarizedExperiment with features equal to the concatenation of features across all input objects. Unlikerbind
, it permits differences in the number and identity of the columns, differences in the availablerowData
fields, and even differences in the availableassays
among the objects being combined.If
use.names=TRUE
, each input object must have non-NULL
, non-duplicated column names. These names do not have to be the same, or even shared, across the input objects. The column names of the returnedSummarizedExperiment
will be a union of the column names across all input objects. If a column is not present in an input, the corresponding assay andcolData
entries will be filled withfill
andNA
s, respectively, in the combined SummarizedExperiment.If
use.names=FALSE
, all objects must have the same number of columns. The column names of the returned object is set tocolnames(x)
. Any differences in the column names between input objects are ignored.Data in
assays
are combined by matching the names of the assays. If one input object does not contain a named assay present in other input objects, the corresponding assay entries in the returned object will be set tofill
. If all assay names are NULL, matching is done by position. A mixture of named and unnamed assays will throw an error.If
delayed=TRUE
, assay matrices are wrapped inDelayedArray
s to avoid any extra memory allocation during the matrixrbind
ing. Otherwise, the matrices are combined as-is; note that this may still returnDelayedMatrix
s if the inputs were alsoDelayedMatrix
objects.If any input is a
RangedSummarizedExperiment
, the returned object will also be aRangedSummarizedExperiment
. TherowRanges
of the returned object is set to the concatenation of therowRanges
of all inputs. If any input is aSummarizedExperiment
, the returnedrowRanges
is converted into aGRangesList
and the entries corresponding to the rows of theSummarizedExperiment
are set to zero-lengthGRanges
. If all inputs areSummarizedExperiment
objects, aSummarizedExperiment
is also returned.rowData
are combined usingcombineRows
forDataFrame
objects. It is not necessary for all input objects to have the same fields in theirrowData
; missing fields are filled withNA
s for the corresponding rows in the returned object.metadata
from all objects are combined into alist
with no name checking. combineCols(x, ..., use.names=TRUE, delayed=TRUE, fill=NA)
:-
combineCols
acts like more flexiblecbind
, returning a SummarizedExperiment with columns equal to the concatenation of columns across all input objects. Unlikecbind
, it permits differences in the number and identity of the rows, differences in the availablecolData
fields, and even differences in the availableassays
among the objects being combined.If
use.names=TRUE
, each input object must have non-NULL
, non-duplicated row names. These names do not have to be the same, or even shared, across the input objects. The row names of the returnedSummarizedExperiment
will be a union of the row names across all input objects. If a row is not present in an input, the corresponding assay androwData
entries will be filled withfill
andNA
s, respectively, in the combined SummarizedExperiment.If
use.names=FALSE
, all objects must have the same number of rows. The row names of the returned object is set torownames(x)
. Any differences in the row names between input objects are ignored.Data in
assays
are combined by matching the names of the assays. If one input object does not contain a named assay present in other input objects, the corresponding assay entries in the returned object will be set tofill
. If all assay names are NULL, matching is done by position. A mixture of named and unnamed assays will throw an error.If
delayed=TRUE
, assay matrices are wrapped inDelayedArray
s to avoid any extra memory allocation during the matrixrbind
ing. Otherwise, the matrices are combined as-is; note that this may still returnDelayedMatrix
s if the inputs were alsoDelayedMatrix
objects.If any input is a
RangedSummarizedExperiment
, the returned object will also be aRangedSummarizedExperiment
. TherowRanges
of the returned object is set to a merge of therowRanges
of all inputs, where the coordinates for each row are taken from the input object that contains that row. Any conflicting ranges for shared rows will raise a warning and allrowRanges
information from the offendingRangedSummarizedExperiment
will be ignored. If any input is aSummarizedExperiment
, the returnedrowRanges
is converted into aGRangesList
and the entries corresponding to the unique rows of theSummarizedExperiment
are set to zero-lengthGRanges
. If all inputs areSummarizedExperiment
objects, aSummarizedExperiment
is also returned.colData
are combined usingcombineRows
forDataFrame
objects. It is not necessary for all input objects to have the same fields in theircolData
; missing fields are filled withNA
s for the corresponding columns in the returned object.metadata
from all objects are combined into alist
with no name checking.
Implementation and Extension
This section contains advanced material meant for package developers.
SummarizedExperiment is implemented as an S4 class, and can be extended in
the usual way, using contains="SummarizedExperiment"
in the new
class definition.
In addition, the representation of the assays
slot of
SummarizedExperiment is as a virtual class Assays. This
allows derived classes (contains="Assays"
) to implement
alternative requirements for the assays, e.g., backed by file-based
storage like NetCDF or the ff
package, while re-using the existing
SummarizedExperiment class without modification.
See Assays for more information.
Author(s)
Martin Morgan; combineRows
and combineCols
by Aaron Lun
See Also
-
RangedSummarizedExperiment objects.
-
DataFrame, SimpleList, and Annotated objects in the S4Vectors package.
-
saveHDF5SummarizedExperiment
andloadHDF5SummarizedExperiment
in the HDF5Array package for saving/loading an HDF5-based SummarizedExperiment object to/from disk. The
realize
generic function in the DelayedArray package for more information about on-disk realization of objects carrying delayed operations.
Examples
nrows <- 200; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
se0 <- SummarizedExperiment(assays=SimpleList(counts=counts),
colData=colData)
se0
dim(se0)
dimnames(se0)
assayNames(se0)
head(assay(se0))
assays(se0) <- endoapply(assays(se0), asinh)
head(assay(se0))
rowData(se0)
colData(se0)
se0[, se0$Treatment == "ChIP"]
subset(se0, select = Treatment == "ChIP")
## rbind() combines objects with the same samples but different
## features of interest:
se1 <- se0
se2 <- se1[1:50,]
rownames(se2) <- letters[seq_len(nrow(se2))]
cmb2 <- rbind(se1, se2)
dim(cmb2)
dimnames(cmb2)
## cbind() combines objects with the same features of interest
## but different samples:
se1 <- se0
se2 <- se1[,1:3]
colnames(se2) <- letters[seq_len(ncol(se2))]
cmb1 <- cbind(se1, se2)
dim(cmb1)
dimnames(cmb1)
## ---------------------------------------------------------------------
## ON-DISK REALIZATION
## ---------------------------------------------------------------------
library(DelayedArray)
setAutoRealizationBackend("HDF5Array")
cmb3 <- realize(cmb2)
assay(cmb3, withDimnames=FALSE) # an HDF5Matrix object