I/O Tools for Streaming

Basic I/O tools for streaming and data parsing.


High-performance I/O tools for R

Anyone dealing with large data knows that stock tools in R are bad at loading (non-binary) data to R. This package started as an attempt to provide high-performance parsing tools that minimize copying and avoid the use of strings when possible (see mstrsplit, for example).

To allow processing of arbitrarily large files we have added way to process chunk-wise input, making it possible to compute on streaming input as well as very large files (see chunk.reader and chunk.apply).

The next natural progress was to wrap support for Hadoop streaming. The major goal was to make it possible to compute using Hadoop Map Reduce by writing code that is very natural - very much like using lapply on data chunks without the need to know anything about Hadoop. See the WiKi page for the idea and hmr function for the documentation.

Functions in iotools

Name Description
chunk.apply Process input by applying a function to each chunk Map a function over a file by chunks
dstrfw Split fixed width input into a dataframe
dstrsplit Split binary or character input into a dataframe
as.output Character Output
chunk Functions for very fast chunk-wise processing
fdrbind Fast row-binding of lists and data frames
idstrsplit Create an iterator for splitting binary or character input into a dataframe
ctapply Fast tapply() replacement functions
.default.formatter Default formatter, coorisponding to the as.output functions
write.csv.raw Fast data output to disk
output.file Write an R object to a file as a character string
read.csv.raw Fast data frame input
mstrsplit Split binary or character input into a matrix
line.merge Merge multiple sources
imstrsplit Create an iterator for splitting binary or character input into a matrix
input.file Load a file on the disk
readAsRaw Read binary data in as raw
which.min.key Determine the next key in bytewise order
License GPL-2 | GPL-3
NeedsCompilation yes
Packaged 2018-01-24 18:37:48 UTC; svnuser
Repository CRAN
Date/Publication 2018-01-25 15:09:59 UTC

