make.readchunk: Fast and friendly chunk file finagler

Description

Read a file chunk by chunk

Usage

make.readchunk(input, FUN = identity, chunksize = 5000L)

Arguments

input

a length 1 character string. See Details.

FUN

any function applicated to each chunk

chunksize

number of lines for each chunk

Value

A function with an logical argument, reset. If this argument is TRUE, it indicates that the data should be reread from the beginning by subsequent calls. When it reads all the data, it automatically resets the file. This function returns the value of FUN applied to the chunk. By default, the chunk is returned as a tbl_df object.

Details

It creates a function that reads sucesive chunks of the data referenced by input usings the fread function. The input is characterized in the help page of fread. The data contained in the input reference should not have any header.

This function is inspired by the bigglm example.

Examples

Run this code

library(hflights)
nrow(hflights) # Number of rows

## We create a file with no header
input <- "hflights.csv"
write.table(hflights,file=input,sep=",",
            row.names=FALSE,col.names=FALSE)

## Get the number of rows of each chunk
readchunk <- make.readchunk(input,FUN=function(x){NROW(x)})

a <- NULL
while(!is.null(b <- readchunk())) {
  if(is.null(a)) {
    a <- b
  } else {
    a <- a+b
  }
}
all.equal(a, nrow(hflights))

## It resets automatically the file
a <- NULL
while(!is.null(b <- readchunk())) {
  if(is.null(a)) {
    a <- b
  } else {
    a <- a+b
  }
}
all.equal(a, nrow(hflights))

Run the code above in your browser using DataLab