| DelayedArray-utils {DelayedArray} | R Documentation |
Common operations on DelayedArray objects.
The operations currently supported on DelayedArray objects are:
Delayed operations:
rbind and cbind
sweep
!
is.na, is.finite, is.infinite, is.nan
lengths
nchar, tolower, toupper,
grepl, sub, gsub
pmax2 and pmin2
t
statistical functions like dnorm, dbinom, dpois,
and dlogis (for the Normal, Binomial, Poisson, and Logistic
distribution, respectively) and related functions (documented in
DelayedArray-stats)
Block-processed operations:
anyNA, which
unique, table
all the members of the Summary group
mean
apply
matrix multiplication (%*%) of an ordinary matrix by a DelayedMatrix object
matrix row/col summarization (see
?`DelayedMatrix-stats`)
cbind in the base package for
rbind/cbind'ing ordinary arrays.
arbind and acbind in this package
(DelayedArray) for binding ordinary arrays of arbitrary
dimensions along their rows or columns.
is.na, !,
table, mean,
apply, and %*% in the
base package for the corresponding operations on ordinary
arrays or matrices.
DelayedArray-stats for statistical functions on DelayedArray objects.
DelayedMatrix-stats for DelayedMatrix row/col summarization.
setRealizationBackend for how to set a
realization backend.
writeHDF5Array in the HDF5Array
package for writing an array-like object to an HDF5 file and other
low-level utilities to control the location of automatically created
HDF5 datasets.
DelayedArray objects.
HDF5Array objects in the HDF5Array package.
S4groupGeneric in the methods package
for the members of the Ops,
Math, and Math2 groups.
array objects in base R.
## ---------------------------------------------------------------------
## BIND DelayedArray OBJECTS
## ---------------------------------------------------------------------
## DelayedArray objects can be bound along their 1st (rows) or 2nd
## (columns) dimension with rbind() or cbind(). These operations are
## equivalent to arbind() and acbind(), respectively, and are all
## delayed.
## On 2D objects:
library(HDF5Array)
toy_h5 <- system.file("extdata", "toy.h5", package="HDF5Array")
h5ls(toy_h5)
M1 <- HDF5Array(toy_h5, "M1")
M2 <- HDF5Array(toy_h5, "M2")
M12 <- rbind(M1, t(M2)) # delayed
M12
colMeans(M12) # block-processed
## On objects with more than 2 dimensions:
example(arbind) # to create arrays a1, a2, a3
A1 <- DelayedArray(a1)
A2 <- DelayedArray(a2)
A3 <- DelayedArray(a3)
A123 <- rbind(A1, A2, A3) # delayed
A123
## On 1D objects:
v1 <- array(11:15, 5, dimnames=list(LETTERS[1:5]))
v2 <- array(letters[1:3])
V1 <- DelayedArray(v1)
V2 <- DelayedArray(v2)
V12 <- rbind(V1, V2)
V12
## Not run: cbind(V1, V2) # Error! (the objects to cbind() must have at least 2
# dimensions)
## End(Not run)
## Note that base::rbind() and base::cbind() do something completely
## different on ordinary arrays that are not matrices. They treat them
## as if they were vectors:
rbind(a1, a2, a3)
cbind(a1, a2, a3)
rbind(v1, v2)
cbind(v1, v2)
## Also note that DelayedArray objects of arbitrary dimensions can be
## stored inside a DataFrame object as long as they all have the same
## first dimension (nrow()):
DF <- DataFrame(M=I(tail(M1, n=5)), A=I(A3), V=I(V1))
DF[-3, ]
DF2 <- rbind(DF, DF)
DF2$V
## Sanity checks:
m1 <- as.matrix(M1)
m2 <- as.matrix(M2)
stopifnot(identical(rbind(m1, t(m2)), as.matrix(M12)))
stopifnot(identical(arbind(a1, a2, a3), as.array(A123)))
stopifnot(identical(arbind(v1, v2), as.array(V12)))
stopifnot(identical(rbind(DF$M, DF$M), DF2$M))
stopifnot(identical(rbind(DF$A, DF$A), DF2$A))
stopifnot(identical(rbind(DF$V, DF$V), DF2$V))
## ---------------------------------------------------------------------
## MORE OPERATIONS
## ---------------------------------------------------------------------
M1 >= 0.5 & M1 < 0.75 # delayed
log(M1) # delayed
pmax2(M2, 0) # delayed
## table() is block-processed:
a4 <- array(sample(50L, 2000000L, replace=TRUE), c(200, 4, 2500))
A4 <- as(a4, "HDF5Array")
table(A4)
a5 <- array(sample(20L, 2000000L, replace=TRUE), c(200, 4, 2500))
A5 <- as(a5, "HDF5Array")
table(A5)
A4 - 2 * A5 # delayed
table(A4 - 2 * A5) # block-processed
## range() is block-processed:
range(A4 - 2 * A5)
range(M1)
cmeans <- colMeans(M2) # block-processed
sweep(M2, 2, cmeans) # delayed
## ---------------------------------------------------------------------
## MATRIX MULTIPLICATION
## ---------------------------------------------------------------------
## Matrix multiplication is not delayed: the output matrix is realized
## block by block. The current "realization backend" controls where
## realization happens e.g. in memory if set to NULL or in an HDF5 file
## if set to "HDF5Array". See '?realize' for more information about
## "realization backends".
## The output matrix is returned as a DelayedMatrix object with no delayed
## operations on it. The exact class of the object depends on the backend
## e.g. it will be HDF5Matrix with "HDF5Array" backend.
m <- matrix(runif(50000), ncol=nrow(M1))
## Set backend to NULL for in-memory realization:
setRealizationBackend()
P1 <- m %*% M1
P1
## Set backend to HDF5Array for realization in HDF5 file:
setRealizationBackend("HDF5Array")
## With the HDF5Array backend, the output matrix will be written to an
## automatic location on disk:
getHDF5DumpFile() # HDF5 file where the output matrix will be written
lsHDF5DumpFile()
P2 <- m %*% M1
P2
lsHDF5DumpFile()
## Use setHDF5DumpFile() and setHDF5DumpName() from the HDF5Array package
## to control the location of automatically created HDF5 datasets.
stopifnot(identical(as.array(P1), as.array(P2)))