Streamer (version 1.18.0)

Streamer-package: Package to enable stream (iterative) processing of large data

Description

Large data files can be difficult to work with in R, where data generally resides in memory. This package encourages a style of programming where data is 'streamed' from disk into R through a series of components that, typically, reduce the original data to a manageable size. The package provides useful Producer and Consumer components for operations such as data input, sampling, indexing, and transformation.

Arguments

Details

The central paradigm in this package is a Stream composed of a Producer and zero or more Consumer components. The Producer is responsible for input of data, e.g., from the file system. A Consumer accepts data from a Producer and performs transformations on it. The Stream function is used to assemble a Producer and zero or more Consumer components into a single string.

The yield function can be applied to a stream to generate one `chunk' of data. The definition of chunk depends on the stream and its components. A common paradigm repeatedly invokes yield on a stream, retrieving chunks of the stream for further processing.

See Also

Producer, Consumer are the main types of stream components. Use Stream to connect components, and yield to iterate a stream.

Examples

Run this code
## About this package
packageDescription("Streamer")

## Existing stream components
getClass("Producer")		# Producer classes
getClass("Consumer")            # Consumer classes

## An example
fl <- system.file("extdata", "s_1_sequence.txt", package="Streamer")
b <- RawInput(fl, 100L, reader=rawReaderFactory(1e4))
s <- Stream(RawToChar(), Rev(), b)
s
head(yield(s)) 			# First chunk
close(b)

b <- RawInput(fl, 5000L, verbose=TRUE)
d <- Downsample(sampledSize=50)
s <- Stream(RawToChar(), d, b)
s
s[[2]]

## Processing the first ten chunks of the file
i <- 1
while (10 >= i && 0L != length(chunk <- yield(s)))
{
   cat("chunk", i, "length", length(chunk), "\n")
   i <- i + 1
}
close(b)

Run the code above in your browser using DataLab