| SummarizedExperiment-class {SummarizedExperiment} | R Documentation |
The SummarizedExperiment class is a matrix-like container where rows represent features of interest (e.g. genes, transcripts, exons, etc...) and columns represent samples (with sample data summarized as a DataFrame). A SummarizedExperiment object contains one or more assays, each represented by a matrix-like object of numeric or other mode.
Note that SummarizedExperiment is the parent of the RangedSummarizedExperiment class which means that all the methods documented below also work on a RangedSummarizedExperiment object.
## Constructor # See ?RangedSummarizedExperiment for the constructor function. ## Accessors assayNames(x, ...) assayNames(x, ...) <- value assays(x, withDimnames=TRUE, ...) assays(x, withDimnames=TRUE, ...) <- value assay(x, i, withDimnames=TRUE, ...) assay(x, i, withDimnames=TRUE, ...) <- value rowData(x, use.names=TRUE, ...) rowData(x, ...) <- value colData(x, ...) colData(x, ...) <- value #dim(x) #dimnames(x) #dimnames(x) <- value ## Quick colData access ## S4 method for signature 'SummarizedExperiment' x$name ## S4 replacement method for signature 'SummarizedExperiment' x$name <- value ## S4 method for signature 'SummarizedExperiment,ANY,missing' x[[i, j, ...]] ## S4 replacement method for signature 'SummarizedExperiment,ANY,missing' x[[i, j, ...]] <- value ## Subsetting ## S4 method for signature 'SummarizedExperiment' x[i, j, ..., drop=TRUE] ## S4 replacement method for signature 'SummarizedExperiment,ANY,ANY,SummarizedExperiment' x[i, j] <- value ## S4 method for signature 'SummarizedExperiment' subset(x, subset, select, ...) ## Combining ## S4 method for signature 'SummarizedExperiment' cbind(..., deparse.level=1) ## S4 method for signature 'SummarizedExperiment' rbind(..., deparse.level=1) ## On-disk realization ## S4 method for signature 'SummarizedExperiment' realize(x, BACKEND=getAutoRealizationBackend())
x |
A SummarizedExperiment object. |
... |
For For For other accessors, ignored. |
value |
An object of a class specified in the S4 method signature or as outlined in ‘Details’. |
i, j |
For For For |
name |
A symbol representing the name of a column of
|
withDimnames |
A Setting Note that assays(x, withDimnames=FALSE) <- assays(x, withDimnames=FALSE) is guaranteed to always work and be a no-op. This is not the case
if |
use.names |
Like |
drop |
A |
deparse.level |
See |
subset |
An expression which, when evaluated in the
context of |
select |
An expression which, when evaluated in the
context of |
BACKEND |
|
The SummarizedExperiment class is meant for numeric and other
data types derived from a sequencing experiment. The structure is
rectangular like a matrix, but with additional annotations on
the rows and columns, and with the possibility to manage several
assays simultaneously.
The rows of a SummarizedExperiment object represent features
of interest. Information about these features is stored in a
DataFrame object, accessible using the function
rowData. The DataFrame must have as many rows
as there are rows in the SummarizedExperiment object, with each row
of the DataFrame providing information on the feature in the
corresponding row of the SummarizedExperiment object. Columns of the
DataFrame represent different attributes of the features
of interest, e.g., gene or transcript IDs, etc.
Each column of a SummarizedExperiment object represents a sample.
Information about the samples are stored in a DataFrame,
accessible using the function colData, described below.
The DataFrame must have as many rows as there are
columns in the SummarizedExperiment object, with each row of the
DataFrame providing information on the sample in the
corresponding column of the SummarizedExperiment object.
Columns of the DataFrame represent different sample
attributes, e.g., tissue of origin, etc. Columns of the
DataFrame can themselves be annotated (via the
mcols function). Column names typically
provide a short identifier unique to each sample.
A SummarizedExperiment object can also contain information about
the overall experiment, for instance the lab in which it was conducted,
the publications with which it is associated, etc. This information is
stored as a list object, accessible using the metadata
function. The form of the data associated with the experiment is left to
the discretion of the user.
The SummarizedExperiment container is appropriate for matrix-like
data. The data are accessed using the assays function,
described below. This returns a SimpleList object. Each
element of the list must itself be a matrix (of any mode) and must
have dimensions that are the same as the dimensions of the
SummarizedExperiment in which they are stored. Row and column
names of each matrix must either be NULL or match those of the
SummarizedExperiment during construction. It is convenient for
the elements of SimpleList of assays to be named.
SummarizedExperiment instances are constructed using the
SummarizedExperiment function documented in
?RangedSummarizedExperiment.
In the following code snippets, x is a SummarizedExperiment
object.
assays(x), assays(x) <- value:Get or set the
assays. value is a list or SimpleList, each
element of which is a matrix with the same dimensions as
x.
assay(x, i), assay(x, i) <- value:A convenient
alternative (to assays(x)[[i]], assays(x)[[i]] <-
value) to get or set the ith (default first) assay
element. value must be a matrix of the same dimension as
x, and with dimension names NULL or consistent with
those of x.
assayNames(x), assayNames(x) <- value:Get or
set the names of assay() elements.
rowData(x, use.names=TRUE), rowData(x) <- value:Get or set the row data. value is a DataFrame object.
colData(x), colData(x) <- value:Get or set the
column data. value is a DataFrame object. Row
names of value must be NULL or consistent with the existing
column names of x.
metadata(x), metadata(x) <- value:Get or set
the experiment data. value is a list with arbitrary
content.
dim(x):Get the dimensions (features of interest x samples) of the SummarizedExperiment.
dimnames(x), dimnames(x) <- value:Get or set
the dimension names. value is usually a list of length 2,
containing elements that are either NULL or vectors of
appropriate length for the corresponding dimension. value
can be NULL, which removes dimension names. This method
implies that rownames, rownames<-, colnames,
and colnames<- are all available.
In the code snippets below, x is a SummarizedExperiment object.
x[i,j], x[i,j] <- value:Create or replace a
subset of x. i, j can be numeric,
logical, character, or missing. value
must be a SummarizedExperiment object with dimensions,
dimension names, and assay elements consistent with the subset
x[i,j] being replaced.
subset(x, subset, select):Create a subset of x
using an expression subset referring to columns of
rowData(x) and / or select referring to column names
of colData(x).
Additional subsetting accessors provide convenient access to
colData columns
x$name, x$name <- valueAccess or replace
column name in x.
x[[i, ...]], x[[i, ...]] <- valueAccess or
replace column i in x.
In the code snippets below, x, y and ... are
SummarizedExperiment objects to be combined.
cbind(...):cbind combines objects with the same features of interest
but different samples (columns in assays).
The colnames in colData(SummarizedExperiment) must match or
an error is thrown.
Duplicate columns of rowData(SummarizedExperiment) must
contain the same data.
Data in assays are combined by name matching; if all assay
names are NULL matching is by position. A mixture of names and NULL
throws an error.
metadata from all objects are combined into a list
with no name checking.
rbind(...):rbind combines objects with the same samples
but different features of interest (rows in assays).
The colnames in rowData(SummarizedExperiment) must match or
an error is thrown.
Duplicate columns of colData(SummarizedExperiment) must
contain the same data.
Data in assays are combined by name matching; if all assay
names are NULL matching is by position. A mixture of names and NULL
throws an error.
metadata from all objects are combined into a list
with no name checking.
combineRows(x, ..., use.names=TRUE, delayed=TRUE, fill=NA):combineRows acts like more flexible rbind, returning a
SummarizedExperiment with features equal to the concatenation of features
across all input objects. Unlike rbind, it permits differences in
the number and identity of the columns, differences in the available
rowData fields, and even differences in the available
assays among the objects being combined.
If use.names=TRUE, each input object must have non-NULL,
non-duplicated column names. These names do not have to be the same, or
even shared, across the input objects. The column names of the returned
SummarizedExperiment will be a union of the column names across
all input objects. If a column is not present in an input, the
corresponding assay and colData entries will be filled with
fill and NAs, respectively, in the combined
SummarizedExperiment.
If use.names=FALSE, all objects must have the same number of
columns. The column names of the returned object is set to
colnames(x). Any differences in the column names between input
objects are ignored.
Data in assays are combined by matching the names of the assays.
If one input object does not contain a named assay present in other input
objects, the corresponding assay entries in the returned object will be
set to fill. If all assay names are NULL, matching is done by
position. A mixture of named and unnamed assays will throw an error.
If delayed=TRUE, assay matrices are wrapped in
DelayedArrays to avoid any extra memory allocation during
the matrix rbinding. Otherwise, the matrices are combined as-is;
note that this may still return DelayedMatrixs if the inputs were
also DelayedMatrix objects.
If any input is a RangedSummarizedExperiment, the returned object
will also be a RangedSummarizedExperiment. The rowRanges of
the returned object is set to the concatenation of the rowRanges
of all inputs. If any input is a SummarizedExperiment, the
returned rowRanges is converted into a GRangesList and the
entries corresponding to the rows of the SummarizedExperiment are
set to zero-length GRanges. If all inputs are
SummarizedExperiment objects, a SummarizedExperiment is
also returned.
rowData are combined using combineRows for
DataFrame objects. It is not necessary for all input objects to
have the same fields in their rowData; missing fields are filled
with NAs for the corresponding rows in the returned object.
metadata from all objects are combined into a list
with no name checking.
combineCols(x, ..., use.names=TRUE, delayed=TRUE, fill=NA):combineCols acts like more flexible cbind, returning a
SummarizedExperiment with columns equal to the concatenation of columns
across all input objects. Unlike cbind, it permits differences in
the number and identity of the rows, differences in the available
colData fields, and even differences in the available
assays among the objects being combined.
If use.names=TRUE, each input object must have non-NULL,
non-duplicated row names. These names do not have to be the same, or
even shared, across the input objects. The row names of the returned
SummarizedExperiment will be a union of the row names across
all input objects. If a row is not present in an input, the
corresponding assay and rowData entries will be filled with
fill and NAs, respectively, in the combined
SummarizedExperiment.
If use.names=FALSE, all objects must have the same number of rows.
The row names of the returned object is set to rownames(x). Any
differences in the row names between input objects are ignored.
Data in assays are combined by matching the names of the assays.
If one input object does not contain a named assay present in other input
objects, the corresponding assay entries in the returned object will be
set to fill. If all assay names are NULL, matching is done by
position. A mixture of named and unnamed assays will throw an error.
If delayed=TRUE, assay matrices are wrapped in
DelayedArrays to avoid any extra memory allocation during
the matrix rbinding. Otherwise, the matrices are combined as-is;
note that this may still return DelayedMatrixs if the inputs were
also DelayedMatrix objects.
If any input is a RangedSummarizedExperiment, the returned object
will also be a RangedSummarizedExperiment. The rowRanges of
the returned object is set to a merge of the rowRanges of all
inputs, where the coordinates for each row are taken from the input
object that contains that row. Any conflicting ranges for shared rows
will raise a warning and all rowRanges information from the
offending RangedSummarizedExperiment will be ignored. If any
input is a SummarizedExperiment, the returned rowRanges is
converted into a GRangesList and the entries corresponding to the
unique rows of the SummarizedExperiment are set to zero-length
GRanges. If all inputs are SummarizedExperiment objects, a
SummarizedExperiment is also returned.
colData are combined using combineRows for
DataFrame objects. It is not necessary for all input objects to
have the same fields in their colData; missing fields are filled
with NAs for the corresponding columns in the returned object.
metadata from all objects are combined into a list
with no name checking.
This section contains advanced material meant for package developers.
SummarizedExperiment is implemented as an S4 class, and can be extended in
the usual way, using contains="SummarizedExperiment" in the new
class definition.
In addition, the representation of the assays slot of
SummarizedExperiment is as a virtual class Assays. This
allows derived classes (contains="Assays") to implement
alternative requirements for the assays, e.g., backed by file-based
storage like NetCDF or the ff package, while re-using the existing
SummarizedExperiment class without modification.
See Assays for more information.
Martin Morgan; combineRows and combineCols by Aaron Lun
RangedSummarizedExperiment objects.
DataFrame, SimpleList, and Annotated objects in the S4Vectors package.
saveHDF5SummarizedExperiment and
loadHDF5SummarizedExperiment in the
HDF5Array package for saving/loading an HDF5-based
SummarizedExperiment object to/from disk.
The realize generic function in the
DelayedArray package for more information about on-disk
realization of objects carrying delayed operations.
nrows <- 200; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
se0 <- SummarizedExperiment(assays=SimpleList(counts=counts),
colData=colData)
se0
dim(se0)
dimnames(se0)
assayNames(se0)
head(assay(se0))
assays(se0) <- endoapply(assays(se0), asinh)
head(assay(se0))
rowData(se0)
colData(se0)
se0[, se0$Treatment == "ChIP"]
subset(se0, select = Treatment == "ChIP")
## cbind() combines objects with the same features of interest
## but different samples:
se1 <- se0
se2 <- se1[,1:3]
colnames(se2) <- letters[seq_len(ncol(se2))]
cmb1 <- cbind(se1, se2)
dim(cmb1)
dimnames(cmb1)
## rbind() combines objects with the same samples but different
## features of interest:
se1 <- se0
se2 <- se1[1:50,]
rownames(se2) <- letters[seq_len(nrow(se2))]
cmb2 <- rbind(se1, se2)
dim(cmb2)
dimnames(cmb2)
## ---------------------------------------------------------------------
## ON-DISK REALIZATION
## ---------------------------------------------------------------------
library(DelayedArray)
setAutoRealizationBackend("HDF5Array")
cmb3 <- realize(cmb2)
assay(cmb3, withDimnames=FALSE) # an HDF5Matrix object