Learn R Programming

⚠️There's a newer version (4.5.2) of this package.Take me there.

ff (version 4.0.9)

Memory-Efficient Storage of Large Data on Disk and Fast Access Functions

Description

The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory - the effective virtual memory consumption per ff object. ff supports R's standard atomic data types 'double', 'logical', 'raw' and 'integer' and non-standard atomic types boolean (1 bit), quad (2 bit unsigned), nibble (4 bit unsigned), byte (1 byte signed with NAs), ubyte (1 byte unsigned), short (2 byte signed with NAs), ushort (2 byte unsigned), single (4 byte float with NAs). For example 'quad' allows efficient storage of genomic data as an 'A','T','G','C' factor. The unsigned types support 'circular' arithmetic. There is also support for close-to-atomic types 'factor', 'ordered', 'POSIXct', 'Date' and custom close-to-atomic types. ff not only has native C-support for vectors, matrices and arrays with flexible dimorder (major column-order, major row-order and generalizations for arrays). There is also a ffdf class not unlike data.frames and import/export filters for csv files. ff objects store raw data in binary flat files in native encoding, and complement this with metadata stored in R as physical and virtual attributes. ff objects have well-defined hybrid copying semantics, which gives rise to certain performance improvements through virtualization. ff objects can be stored and reopened across R sessions. ff files can be shared by multiple ff R objects (using different data en/de-coding schemes) in the same process or from multiple R processes to exploit parallelism. A wide choice of finalizer options allows to work with 'permanent' files as well as creating/removing 'temporary' ff files completely transparent to the user. On certain OS/Filesystem combinations, creating the ff files works without notable delay thanks to using sparse file allocation. Several access optimization techniques such as Hybrid Index Preprocessing and Virtualization are implemented to achieve good performance even with large datasets, for example virtual matrix transpose without touching a single byte on disk. Further, to reduce disk I/O, 'logicals' and non-standard data types get stored native and compact on binary flat files i.e. logicals take up exactly 2 bits to represent TRUE, FALSE and NA. Beyond basic access functions, the ff package also provides compatibility functions that facilitate writing code for ff and ram objects and support for batch processing on ff objects (e.g. as.ram, as.ff, ffapply). ff interfaces closely with functionality from package 'bit': chunked looping, fast bit operations and coercions between different objects that can store subscript information ('bit', 'bitwhich', ff 'boolean', ri range index, hi hybrid index). This allows to work interactively with selections of large datasets and quickly modify selection criteria. Further high-performance enhancements can be made available upon request.

Copy Link

Version

Install

install.packages('ff')

Monthly Downloads

32,949

Version

4.0.9

License

GPL-2 | GPL-3 | file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Jens Oehlschl�gel

Last Published

January 25th, 2023

Functions in ff (4.0.9)

CFUN

Collapsing functions for batch processing
add

Incrementing an ff or ram object
LimWarn

ff Limitations and Warnings
Extract.ff

Reading and writing vectors and arrays (high-level)
as.ff.bit

Conversion between bit and ff boolean
as.ffdf

Coercing to ffdf and data.frame
clone.ffdf

Cloning ffdf objects
array2vector

Array: make vector from array
Extract.ffdf

Reading and writing data.frames (ffdf)
chunk.ffdf

Chunk ff_vector and ffdf
arrayIndex2vectorIndex

Array: make vector positions from array index
close.ff

Closing ff files
clone.ff

Cloning ff and ram objects
as.ff

Coercing ram to ff and ff to ram objects
as.vmode

Coercing to virtual mode
dim.ff

Getting and setting dim and dimorder
ffdf

ff class for data.frames
delete

Deleting the file behind an ff object
Forbidden_ffdf

Forbidden ffdf functions
dimorderCompatible

Test for dimorder compatibility
bigsample

Sampling from large pools
hiparse

Hybrid Index, parsing
ffapply

Apply for ff objects
is.ff

Test for class ff
ffconform

Get most conforming argument
dimnames.ff

Getting and setting dimnames
ffinfo

Inspect content of ff saves
ffindexorder

Sorting: chunked ordering of integer suscript positions
ffload

Reload ffSaved Datasets
ffindexget

Reading and writing ff vectors using ff subscripts
ffxtensions

Test for availability of ff extensions
ff

ff classes for representing (large) atomic data
filename

Get or set filename
finalize

Call finalizer
length.hi

Hybrid Index, querying
file.resize

Change size of move an existing file
levels.ff

Getting and setting factor levels
length.ff

Getting and setting length
length.ffdf

Getting length of a ffdf dataframe
repnam

Replicate with names
maxffmode

Lossless vmode coercability
getset.ff

Reading and writing vectors of values (low-level)
hi

Hybrid index class
sortLevels

Factor level manipulation
ffdfindexget

Reading and writing ffdf data.frame using ff subscripts
geterror.ff

Get error and error string
names.ff

Getting and setting names
nrowAssign

Assigning the number of rows or columns
physical.ff

Getting and setting physical and virtual attributes of ff objects
getpagesize

Get page size information
physical.ffdf

Getting physical and virtual attributes of ffdf objects
maxlength

Get physical length of an ff or ram object
vt

Virtual transpose
fforder

Sorting: order from ff vectors
mismatch

Test for recycle mismatch
unclass_-

Unclassed assignement
as.hi

Hybrid Index, coercion to
ffdfsort

Sorting: convenience wrappers for data.frames
as.integer.hi

Hybrid Index, coercing from
dimnames.ffdf

Getting and setting dimnames of ffdf
ffdrop

Delete an ffarchive
ffsort

Sorting of ff vectors
ram2ffcode

Factor codings
ffsuitable

Test ff object for suitability
undim

Undim
na.count

Getting and setting 'na.count' physical attribute
vw

Getting and setting virtual windows
write.table.ffdf

Exporting csv files from ff data.frames
ramattribs

Get ramclass and ramattribs
vector2array

Array: make array from vector
splitPathFile

Analyze pathfile-strings
is.ffdf

Test for class ff
is.open

Test if object is opened
dummy.dimnames

Array: make dimnames
vectorIndex2arrayIndex

Array: make array from index vector positions
swap

Reading and writing in one operation (high-level)
print.ff

Print and str methods
ramorder.default

Sorting: order R vector in-RAM and in-place
ffreturn

Return suitable ff object
ramsort.default

Sorting: Sort R vector in-RAM and in-place
read.table.ffdf

Importing csv files into ff data.frames
readwrite.ff

Reading and writing vectors (low-level)
vmode

Virtual storage mode
vecprint

Print beginning and end of big vector
regtest.fforder

Sorting: regression tests
ffsave

Save R and ff objects
finalizer

Get and set finalizer (name)
vmode.ffdf

Virtual storage mode of ffdf
vector.vmode

Create vector of virtual mode
fixdiag

Test for fixed diagonal
is.readonly

Get readonly status
is.sorted

Getting and setting 'is.sorted' physical attribute
matcomb

Array: make matrix indices from row and columns positions
matprint

Print beginning and end of big matrix
open.ff

Opening an ff file
pagesize

Pagesize of ff object
symmIndex2vectorIndex

Array: make vector positions from symmetric array index
symmetric

Test for symmetric structure
unsort

Hybrid Index, internal utilities
update.ff

Update ff content from another object
Internal_ffdf

Internal ffdf functions