| Streamer-package {Streamer} | R Documentation |
Large data files can be difficult to work with in R, where data
generally resides in memory. This package encourages a style of
programming where data is 'streamed' from disk into R through a series
of components that, typically, reduce the original data to a
manageable size. The package provides useful
Producer and Consumer
components for operations such as data input, sampling, indexing, and
transformation.
The central paradigm in this package is a Stream composed of a
Producer and zero or more
Consumer components. The Producer is
responsible for input of data, e.g., from the file system. A
Consumer accepts data from a Producer and performs
transformations on it. The Stream function is used to
assemble a Producer and zero or more Consumer components
into a single string.
The yield function can be applied to a stream to
generate one ‘chunk’ of data. The definition of chunk depends on the
stream and its components. A common paradigm repeatedly invokes
yield on a stream, retrieving chunks of the stream for further
processing.
Martin Morgan mtmorgan@fhcrc.org
Producer, Consumer are the
main types of stream components. Use Stream to connect
components, and yield to iterate a stream.
## About this package
packageDescription("Streamer")
## Existing stream components
getClass("Producer") # Producer classes
getClass("Consumer") # Consumer classes
## An example
fl <- system.file("extdata", "s_1_sequence.txt", package="Streamer")
b <- RawInput(fl, 100L, reader=rawReaderFactory(1e4))
s <- Stream(RawToChar(), Rev(), b)
s
head(yield(s)) # First chunk
close(b)
b <- RawInput(fl, 5000L, verbose=TRUE)
d <- Downsample(sampledSize=50)
s <- Stream(RawToChar(), d, b)
s
s[[2]]
## Processing the first ten chunks of the file
i <- 1
while (10 >= i && 0L != length(chunk <- yield(s)))
{
cat("chunk", i, "length", length(chunk), "\n")
i <- i + 1
}
close(b)