ff v2.2-14

0

Monthly downloads

0th

Percentile

Memory-Efficient Storage of Large Data on Disk and Fast Access Functions

The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory - the effective virtual memory consumption per ff object. ff supports R's standard atomic data types 'double', 'logical', 'raw' and 'integer' and non-standard atomic types boolean (1 bit), quad (2 bit unsigned), nibble (4 bit unsigned), byte (1 byte signed with NAs), ubyte (1 byte unsigned), short (2 byte signed with NAs), ushort (2 byte unsigned), single (4 byte float with NAs). For example 'quad' allows efficient storage of genomic data as an 'A','T','G','C' factor. The unsigned types support 'circular' arithmetic. There is also support for close-to-atomic types 'factor', 'ordered', 'POSIXct', 'Date' and custom close-to-atomic types. ff not only has native C-support for vectors, matrices and arrays with flexible dimorder (major column-order, major row-order and generalizations for arrays). There is also a ffdf class not unlike data.frames and import/export filters for csv files. ff objects store raw data in binary flat files in native encoding, and complement this with metadata stored in R as physical and virtual attributes. ff objects have well-defined hybrid copying semantics, which gives rise to certain performance improvements through virtualization. ff objects can be stored and reopened across R sessions. ff files can be shared by multiple ff R objects (using different data en/de-coding schemes) in the same process or from multiple R processes to exploit parallelism. A wide choice of finalizer options allows to work with 'permanent' files as well as creating/removing 'temporary' ff files completely transparent to the user. On certain OS/Filesystem combinations, creating the ff files works without notable delay thanks to using sparse file allocation. Several access optimization techniques such as Hybrid Index Preprocessing and Virtualization are implemented to achieve good performance even with large datasets, for example virtual matrix transpose without touching a single byte on disk. Further, to reduce disk I/O, 'logicals' and non-standard data types get stored native and compact on binary flat files i.e. logicals take up exactly 2 bits to represent TRUE, FALSE and NA. Beyond basic access functions, the ff package also provides compatibility functions that facilitate writing code for ff and ram objects and support for batch processing on ff objects (e.g. as.ram, as.ff, ffapply). ff interfaces closely with functionality from package 'bit': chunked looping, fast bit operations and coercions between different objects that can store subscript information ('bit', 'bitwhich', ff 'boolean', ri range index, hi hybrid index). This allows to work interactively with selections of large datasets and quickly modify selection criteria. Further high-performance enhancements can be made available upon request.

Functions in ff

Name Description
dimnames.ffdf Getting and setting dimnames of ffdf
ff ff classes for representing (large) atomic data
clone.ffdf Cloning ffdf objects
ramattribs Get ramclass and ramattribs
ffsort Sorting of ff vectors
levels.ff Getting and setting factor levels
geterror.ff Get error and error string
ffapply Apply for ff objects
ffsuitable Test ff object for suitability
maxlength Get physical length of an ff or ram object
ramsort.default Sorting: Sort R vector in-RAM and in-place
getpagesize Get page size information
fixdiag Test for fixed diagonal
pagesize Pagesize of ff object
ffdfindexget Reading and writing ffdf data.frame using ff subscripts
ffdfsort Sorting: convenience wrappers for data.frames
is.open Test if object is opened
close.ff Closing ff files
dummy.dimnames Array: make dimnames
unclass_- Unclassed assignement
delete Deleting the file behind an ff object
dim.ff Getting and setting dim and dimorder
read.table.ffdf Importing csv files into ff data.frames
is.ffdf Test for class ff
finalizer Get and set finalizer (name)
length.hi Hybrid Index, querying
open.ff Opening an ff file
ffinfo Inspect content of ff saves
maxffmode Lossless vmode coercability
ffconform Get most conforming argument
physical.ffdf Getting physical and virtual attributes of ffdf objects
ffdrop Delete an ffarchive
write.table.ffdf Exporting csv files from ff data.frames
physical.ff Getting and setting physical and virtual attributes of ff objects
ffdf ff class for data.frames
vector.vmode Create vector of virtual mode
unsort Hybrid Index, internal utilities
print.ff Print and str methods
update.ff Update ff content from another object
dimnames.ff_array Getting and setting dimnames
ffindexget Reading and writing ff vectors using ff subscripts
vecprint Print beginning and end of big vector
ffreturn Return suitable ff object
ffload Reload ffSaved Datasets
ffxtensions Test for availability of ff extensions
ffsave Save R and ff objects
matcomb Array: make matrix indices from row and columns positions
file.resize Change size of move an existing file
is.readonly Get readonly status
is.sorted Getting and setting 'is.sorted' physical attribute
filename Get or set filename
vmode Virtual storage mode
getset.ff Reading and writing vectors of values (low-level)
finalize Call finalizer
names.ff Getting and setting names
hi Hybrid index class
vt Virtual transpose
mismatch Test for recycle mismatch
length.ff Getting and setting length
length.ffdf Getting length of a ffdf dataframe
hiparse Hybrid Index, parsing
readwrite.ff Reading and writing vectors (low-level)
undim Undim
nrowAssign Assigning the number of rows or columns
is.ff Test for class ff
matprint Print beginning and end of big matrix
na.count Getting and setting 'na.count' physical attribute
repnam Replicate with names
vmode.ffdf Virtual storage mode of ffdf
ram2ffcode Factor codings
sortLevels Factor level manipulation
symmIndex2vectorIndex Array: make vector positions from symmetric array index
swap Reading and writing in one operation (high-level)
symmetric Test for symmetric structure
splitPathFile Analyze pathfile-strings
vector2array Array: make array from vector
vw Getting and setting virtual windows
vectorIndex2arrayIndex Array: make array from index vector positions
Forbidden_ffdf Forbidden ffdf functions
Extract.ffdf Reading and writing data.frames (ffdf)
as.ff.bit Conversion between bit and ff boolean
CFUN Collapsing functions for batch processing
Extract.ff Reading and writing vectors and arrays (high-level)
bigsample Sampling from large pools
chunk.ffdf Chunk ff_vector and ffdf
chunk.bit Chunk bit vectors
as.ffdf Coercing to ffdf and data.frame
as.vmode Coercing to virtual mode
clone Cloning ff and ram objects
add Incrementing an ff or ram object
arrayIndex2vectorIndex Array: make vector positions from array index
as.ff Coercing ram to ff and ff to ram objects
as.hi Hybrid Index, coercion to
array2vector Array: make vector from array
Internal_ffdf Internal ffdf functions
LimWarn ff Limitations and Warnings
as.integer.hi Hybrid Index, coercing from
No Results!

Last month downloads

Details

Date 2018-04-15
License GPL-2 | file LICENSE
LazyLoad yes
ByteCompile yes
Encoding latin1
URL http://ff.r-forge.r-project.org/
Packaged 2018-04-15 15:30:57 UTC; jo
NeedsCompilation yes
Repository CRAN
Date/Publication 2018-05-15 21:19:05 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/ff)](http://www.rdocumentation.org/packages/ff)