Learn R Programming

⚠️There's a newer version (0.3-5) of this package.Take me there.

High-performance I/O tools for R

Anyone dealing with large data knows that stock tools in R are bad at loading (non-binary) data to R. This package started as an attempt to provide high-performance parsing tools that minimize copying and avoid the use of strings when possible (see mstrsplit, for example).

To allow processing of arbitrarily large files we have added way to process chunk-wise input, making it possible to compute on streaming input as well as very large files (see chunk.reader and chunk.apply).

The next natural progress was to wrap support for Hadoop streaming. The major goal was to make it possible to compute using Hadoop Map Reduce by writing code that is very natural - very much like using lapply on data chunks without the need to know anything about Hadoop. See the WiKi page for the idea and hmr function for the documentation.

Copy Link

Version

Install

install.packages('iotools')

Monthly Downloads

3,129

Version

0.1-12

License

GPL-2 | GPL-3

Maintainer

Simon Urbanek

Last Published

July 31st, 2015

Functions in iotools (0.1-12)

input.file

Load a file on the disk
dstrsplit

Split binary or character input into a dataframe
line.merge

Merge multiple sources
chunk.apply

Process input by applying a function to each chunk
which.min.key

Determine the next key in bytewise order
write.csv.raw

Fast data output to disk
.default.formatter

Default formatter, coorisponding to the as.output functions
imstrsplit

Create an iterator for splitting binary or character input into a matrix
mstrsplit

Split binary or character input into a matrix
output.file

Write an R object to a file as a character string
dstrfw

Split fixed width input into a dataframe
chunk

Functions for very fast chunk-wise processing
as.output

Character Output
readAsRaw

Read binary data in as raw
idstrsplit

Create an iterator for splitting binary or character input into a dataframe
ctapply

Fast tapply() replacement functions
read.csv.raw

Fast data frame input