Unlimited learning, half price | 50% off

Last chance! 50% off unlimited learning

Sale ends in


R.huge (version 0.10.1)

AbstractFileArray: Class representing a persistent array stored in a file

Description

Package: R.huge
Class AbstractFileArray

Object
~~|
~~+--AbstractFileArray

Directly known subclasses:
FileByteMatrix, FileByteVector, FileDoubleMatrix, FileDoubleVector, FileFloatMatrix, FileFloatVector, FileIntegerMatrix, FileIntegerVector, FileMatrix, FileShortMatrix, FileShortVector, FileVector

public static class AbstractFileArray
extends Object

Note that this is an abstract class, i.e. it is not possible to create an object of this class but only from one of its subclasses. For a vector data type, see FileVector. For a matrix data type, see FileMatrix.

Usage

AbstractFileArray(filename=NULL, path=NULL, storageMode=c("integer", "double"),
  bytesPerCell=1, dim=NULL, dimnames=NULL, dimOrder=NULL, comments=NULL,
  nbrOfFreeBytes=4096)

Arguments

filename

The name of the file storing the data.

path

An optional path where data should be stored.

storageMode

The storage mode() of the data elements.

bytesPerCell

The number of bytes each element (cell) takes up on the file system. If NULL, it is inferred from the storageMode argument.

dim

A numeric vector specifying the dimensions of the array.

dimnames

An optional list of dimension names.

dimOrder

The order of the dimensions.

comments

An optional character string of arbitrary length.

nbrOfFreeBytes

The number of "spare" bytes after the comments before the data section begins.

Fields and Methods

Methods:

as.characterReturns a short string describing the file array.
as.vectorReturns the elements of a file array as an R vector.
cloneClones a file array.
closeCloses a connection to the data file of the file array.
deleteDeletes the file array from the file system.
dimGets the dimension of the file array.
dimnamesGets the dimension names of a file array.
finalizeInternal: Clean up when file array is deallocated from memory.
flushInternal: Flushes the write buffer.
getBasenameGets the basename (filename) of the data file.
getBytesPerCellGets the number of bytes per element in a file array.
getCloneNumberGets the clone number of the file array.
getCommentsGets the comments for this file array.
getDataOffsetGets file position of the data section in a file array.
getDimensionOrderGets the order of dimension.
getExtensionGets the filename extension of the file array.
getFileSizeGets the size of the file array.
getNameGets the name of the file array.
getPathGets the path (directory) where the data file lives.
getPathnameGets the full pathname to the data file.
getSizeOfCommentsGets the number of bytes the comments occupies.
getSizeOfDataGets the size of the data section in bytes.
getStorageModeGets the storage mode of the file array.
isOpenChecks whether the data file of the file array is open or not.
lengthGets the number of elements in a file array.
openOpens a connection to the data file of the file array.
readAllValuesReads all values in the file array.
readContiguousValuesReads sets of contiguous values in the file array.
readHeaderRead the header of a file array data file.
readValuesReads individual values in the file array.
setCommentsSets the comments for this file array.
writeAllValuesWrites all values to a file array.
writeEmptyDataWrites an empty data section to the data file of a file array.
writeHeaderWrites the header of a file array to file.
writeHeaderComments-
writeValuesWrites values to a file array.

Methods inherited from Object:
$, $<-, [[, [[<-, as.character, attach, attachLocally, clearCache, clearLookupCache, clone, detach, equals, extend, finalize, getEnvironment, getFieldModifier, getFieldModifiers, getFields, getInstantiationTime, getStaticInstance, hasField, hashCode, ll, load, names, objectSize, print, save

Maximum number of elements

It is only the header that is kept in memory, not the data, and therefore the maximum length of a array that can be allocate, is limited by the amount of available space on the file system. Since element names (optional) are stored in the header, these may also be a limiting factor.

Element names

The element names are stored in the header and are currently read and written to file one by one. This may slow down the performance substantially if the dimensions are large. For optimal opening performance, avoid names.

For now, do not change names after file has been allocated.

File format

The file format consist of a header section and a data section. The header contains information about the file format, the length and element names of the array, as well as data type (storage mode()), the size of each element. The data section, which follows immediately after the header section, consists of all data elements with non-assigned elements being pre-allocated with zeros.

For more details, see the source code.

Limitations

The size of the array in bytes is limited by the maximum file size of the file system. For instance, the maximum file size on a Windows FAT32 system is 4GB (2GB?). On Windows NTFS the limit is in practice ~16TB.

Author

Henrik Bengtsson

Details

The purpose of this class is to be able to work with large arrays in R without being limited by the amount of memory available. Data is kept on the file system and elements are read and written whenever queried.

References

[1] New Technology File System (NTFS), Wikipedia, 2006 https://en.wikipedia.org/wiki/NTFS.