ff (version 4.0.12)

ffdf: ff class for data.frames

Description

Function 'ffdf' creates ff data.frames stored on disk very similar to 'data.frame'

Usage

ffdf(...
, row.names = NULL
, ff_split = NULL
, ff_join = NULL
, ff_args = NULL
, update = TRUE
, BATCHSIZE = .Machine$integer.max
, BATCHBYTES = getOption("ffbatchbytes")
, VERBOSE = FALSE)

Value

A list with components

physical

the underlying ff vectors and matrices, to be accessed via physical

virtual

the virtual features of the ffdf including the virtual-to-physical mapping, to be accessed via virtual

row.names

the optional row.names, see argument row.names

and class 'ffdf' (NOTE that ffdf dows not inherit from ff)

Arguments

...

ff vectors or matrices (optionally wrapped in I() that shall be bound together to an ffdf object

row.names

A character vector. Not recommended for large objects with many rows.

ff_split

A vector of character names or integer positions identifying input components to physically split into single ff_vectors. If vector elements have names, these are used as root name for the new ff files.

ff_join

A list of vectors with character names or integer positions identifying input components to physically join in the same ff matrix. If list elements have names, these are used to name the new ff files.

update

By default (TRUE) new ff files are updated with content of input ff objects. Setting to FALSE prevents this update.

ff_args

a list with further arguments passed to ff in case that new ff objects are created via 'ff_split' or 'ff_join'

BATCHSIZE

passed to update.ff

BATCHBYTES

passed to update.ff

VERBOSE

passed to update.ff

Methods

The following methods and functions are available for ffdf objects:

Type Name Assign Comment
Basic functions
functionffdf constructor for ffdf objects
genericupdate updates one ffdf object with the content of another
genericclone clones an ffdf object
methodprint print ffdf
methodstr ffdf object structure
Class test and coercion
functionis.ffdf check if inherits from ff
genericas.ffdf coerce to ff, if not yet
genericas.data.frame coerce to ram data.frame
Virtual storage mode
genericvmode get virtual modes for all (virtual) columns
Physical attributes
functionphysical get physical attributes
Virtual attributes
functionvirtual get virtual attributes
methodlength get length
methoddim <-get dim and set nrow
genericdimorder get the dimorder (non-standard if any component is non-standard)
methodnames<-set and get names
methodrow.names<-set and get row.names
methoddimnames<-set and get dimnames
methodpattern<-set pattern (rename/move files)
Access functions
method[<-set and get data.frame content ([,]) or get ffdf with less columns ([])
method[[<-set and get single column ff object
method$<-set and get single column ff object
Opening/Closing/Deleting
genericis.open tri-bool is.open status of the physical ff components
methodopen open all physical ff objects (is done automatically on access)
methodclose close all physical ff objects
methoddelete deletes all physical ff files
methodfinalize call finalizer
processing
methodchunk create chunked index
methodsortLevels sort and recode levels
Other

Author

Jens Oehlschlägel

Details

By default, creating an 'ffdf' object will NOT create new ff files, instead existing files are referenced. This differs from data.frame, which always creates copies of the input objects, most notably in data.frame(matrix()), where an input matrix is converted to single columns. ffdf by contrast, will store an input matrix physically as the same matrix and virtually map it to columns. Physically copying a large ff matrix to single ff vectors can be expensive. More generally, ffdf objects have a physical and a virtual component, which allows very flexible dataframe designs: a physically stored matrix can be virtually mapped to single columns, a couple of physically stored vectors can be virtually mapped to a single matrix. The means to configure these are I for the virtual representation and the 'ff_split' and 'ff_join' arguments for the physical representation. An ff matrix wrapped into 'I()' will return the input matrix as a single object, using 'ff_split' will store this matrix as single vectors - and thus create new ff files. 'ff_join' will copy a couple of input vectors into a unified new ff matrix with dimorder=c(2,1), but virtually they will remain single columns. The returned ffdf object has also a dimorder attribute, which indicates whether the ffdf object contains a matrix with non-standard dimorder c(2,1), see dimorderStandard.
Currently, virtual windows are not supported for ffdf.

See Also

data.frame, ff, for more example see physical

Examples

Run this code
 m <- matrix(1:12, 3, 4, dimnames=list(c("r1","r2","r3"), c("m1","m2","m3","m4")))
 v <- 1:3
 ffm <- as.ff(m)
 ffv <- as.ff(v)

 d <- data.frame(m, v)
 ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm))
 all.equal(d, ffd[,])
 ffd
 physical(ffd)

 d <- data.frame(m, v)
 ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm), ff_split=1)
 all.equal(d, ffd[,])
 ffd
 physical(ffd)

 d <- data.frame(m, v)
 ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm), ff_join=list(newff=c(1,2)))
 all.equal(d, ffd[,])
 ffd
 physical(ffd)

 d <- data.frame(I(m), I(v))
 ffd <- ffdf(m=I(ffm), v=I(ffv), row.names=row.names(ffm))
 all.equal(d, ffd[,])
 ffd
 physical(ffd)

 rm(ffm,ffv,ffd); gc()

Run the code above in your browser using DataLab