
Function write.table.ffdf
writes a ffdf
object to a separated flat file, very much like (and using) write.table
.
It can also work with any convenience wrappers like write.csv
and provides its own convenience wrapper (e.g. write.csv.ffdf
) for R's usual wrappers.
write.table.ffdf(x = NULL
, file, append = FALSE
, nrows = -1, first.rows = NULL, next.rows = NULL
, FUN = "write.table", ...
, transFUN = NULL
, BATCHBYTES = getOption("ffbatchbytes")
, VERBOSE = FALSE
)
write.csv.ffdf(...)
write.csv2.ffdf(...)
write.csv(...)
write.csv2(...)
a ffdf
object which to export to the separated file
either a character string naming a file or a connection
open for writing. ""
indicates output to the console.
logical. Only relevant if file
is a character
string. If TRUE
, the output is appended to the
file. If FALSE
, any existing file of the name is destroyed.
integer: the maximum number of rows to write in (includes first.rows in case a 'first' chunk is read) Negative and other invalid values are ignored.
the number of rows to write with the first chunk (default: next.rows)
integer: number of rows to write in further chunks, see details.
By default calculated as BATCHBYTES %/% sum(.rambytes[vmode(x)])
character: name of a function that is called for writing each chunk, see write.table
, write.csv
, etc.
further arguments, passed to FUN
in write.table.ffdf
, or passed to write.table.ffdf
in the convenience wrappers
NULL or a function that is called on each data.frame chunk before writing with FUN
(for filtering, transformations etc.)
integer: bytes allowed for the size of the data.frame
storing the result of reading one chunk. Default getOption("ffbatchbytes")
.
logical: TRUE to verbose timings for each processed chunk (default FALSE)
Jens Oehlschlägel, Christophe Dutang
write.table.ffdf
has been designed to export very large ffdf
objects to separated flatfiles in chunks.
The first chunk is potentially written with col.names. Further chunks are appended.
write.table.ffdf
has been designed to behave as much like write.table
as possible. However, note the following differences:
by default row.names
are only written if the ffdf
has row.names.
read.table.ffdf
, write.table
, ffdf
x <- data.frame(log=rep(c(FALSE, TRUE), length.out=26), int=1:26, dbl=1:26 + 0.1
, fac=factor(letters), ord=ordered(LETTERS), dct=Sys.time()+1:26
, dat=seq(as.Date("1910/1/1"), length.out=26, by=1), stringsAsFactors = TRUE)
ffx <- as.ffdf(x)
csvfile <- tempPathFile(path=getOption("fftempdir"), extension="csv")
write.csv.ffdf(ffx, file=csvfile)
write.csv.ffdf(ffx, file=csvfile, append=TRUE)
ffy <- read.csv.ffdf(file=csvfile, header=TRUE
, colClasses=c(ord="ordered", dct="POSIXct", dat="Date"))
rm(ffx, ffy); gc()
unlink(csvfile)
if (FALSE) {
# Attention, this takes very long
vmodes <- c(log="boolean", int="byte", dbl="single"
, fac="short", ord="short", dct="single", dat="single")
message("create a ffdf with 7 columns and 78 mio rows")
system.time({
x <- data.frame(log=rep(c(FALSE, TRUE), length.out=26), int=1:26, dbl=1:26 + 0.1
, fac=factor(letters), ord=ordered(LETTERS), dct=Sys.time()+1:26
, dat=seq(as.Date("1910/1/1"), length.out=26, by=1), stringsAsFactors = TRUE)
x <- do.call("rbind", rep(list(x), 10))
x <- do.call("rbind", rep(list(x), 10))
x <- do.call("rbind", rep(list(x), 10))
x <- do.call("rbind", rep(list(x), 10))
ffx <- as.ffdf(x, vmode = vmodes)
for (i in 2:300){
message(i, "\n")
last <- nrow(ffx) + nrow(x)
first <- last - nrow(x) + 1L
nrow(ffx) <- last
ffx[first:last,] <- x
}
})
csvfile <- tempPathFile(path=getOption("fftempdir"), extension="csv")
write.csv.ffdf(ffx, file=csvfile, VERBOSE=TRUE)
ffy <- read.csv.ffdf(file=csvfile, header=TRUE
, colClasses=c(ord="ordered", dct="POSIXct", dat="Date")
, asffdf_args=list(vmode = vmodes), VERBOSE=TRUE)
rm(ffx, ffy); gc()
unlink(csvfile)
}
Run the code above in your browser using DataLab