Learn R Programming

bigmemory (version 3.6)

big.matrix, shared.big.matrix, filebacked.big.matrix, is.big.matrix, as.big.matrix, is.shared, is.separated, is.filebacked, remove.backing: The core ``big.matrix'' operations.

Description

Create a big.matrix (or check to see if an object is a big.matrix, or create a big.matrix from a matrix, and so on). The big.matrix may be in-memory (shared or not), or file backed (which is always shareable).

Usage

big.matrix(nrow, ncol, type = "integer", init = NULL, dimnames = NULL,
           separated = FALSE, shared = FALSE, 
           backingfile = NULL, backingpath = NULL, descriptorfile = NULL, preserve = TRUE)
shared.big.matrix(nrow, ncol, type = "integer", init = NULL, dimnames = NULL,
           separated = FALSE, backingfile = NULL, backingpath = NULL, descriptorfile = NULL,
           preserve = TRUE)
filebacked.big.matrix(nrow, ncol, type = "integer", init = NULL, dimnames = NULL,
           separated = FALSE, backingfile = NULL, backingpath = NULL, descriptorfile = NULL,
           preserve = TRUE)
as.big.matrix(x, type = NULL, separated = FALSE, shared = FALSE,
              backingfile = NULL, backingpath = NULL, descriptorfile = NULL, preserve = TRUE)
is.big.matrix(x)
is.separated(x)
is.shared(x)
is.filebacked(x)

Arguments

x
a matrix or vector; if a vector, a one-column big.matrix is created by as.big.matrix.
nrow
number of rows.
ncol
number of columns.
type
the type of the atomic element ("integer" by default).
init
a scalar value for initializing the matrix (NULL by default to avoid unnecessary time spent doing the initializing).
dimnames
a list of the row and column names.
separated
use separated column organization of the data; see details.
shared
TRUE if the big.matrix should be allocated to shared memory.
backingfile
the root name for the file(s) for the cache of x.
backingpath
the path to the directory containing the file backing cache.
descriptorfile
the name of the file to hold the filebacked description, for subsequent use with attach.big.matrix; if NULL, the backingfile is used as the root. The descriptor file is placed in the
preserve
if this is a filebacked big.matrix, it is preserved, by default, even after the end of the R session unless this option is set to FALSE.

Value

  • A big.matrix is returned (for big.matrix, shared.big.matrix, filebacked.big.matrix, and as.big.matrix), and TRUE or FALSE for is.big.matrix and the other functions.

Details

A big.matrix consists of an object in R that does nothing more than point to the data structure implemented in C++. The object acts much like a traditional R matrix, but helps protect the user from many inadvertant memory-consuming pitfalls of traditional R matrices and data frames.

There are three big.matrix types which manage data in different ways. The basic (or default) big.matrix is not shared across processes and is limited to available RAM. A shared big.matrix has identical size constraints as the basic big.matrix, but may be shared across separate Rprocesses. A file-backed big.matrix may exceed available RAM by using hard drive space, and may also be shared across processes. The atomic types of these matrices may be double, integer, short, or char (8, 4, 2, and 1 bytes, respectively).

If x is a big.matrix, then x[1:5,] is returned as an R matrix containing the first five rows of x. If x is of type double, then the result will be numeric; otherwise, the result will be an integer R matrix. The expression x alone will display information about the R object (e.g. the external pointer) rather than evaluating the matrix itself (the user should try x[,] with extreme caution, recognizing that a huge R matrix will be created).

If x has a huge number of rows, then the use of rownames will be extremely memory-intensive and should be avoided. If x has a huge number of columns, the user might want to store the transpose as there is overhead of a pointer (and possibly mutexes) for each column in the matrix.

If separated is TRUE, then the memory is allocated into separate vectors for each column. If separated is FALSE, the matrix is stored in traditional column-major format. The function is.separated() returns the separation type of the big.matrix.

When a big.matrix, x, is passed as an argument to a function, it is essentially providing call-by-reference rather than call-by-value behavior. If the function modified any of the values of x within the function, the changes are not limited in scope to a local copy within the function.

A shared big.matrix object is essentially the same as a non-shared big.matrix object except the memory being managed may be shared across R sessions.

A file-backed big.matrix may exceed available RAM in size by using a file cache (or possibly multiple file caches, if separated is TRUE). This can incur a substantial performance penalty for large matrices, but could be useful nonetheless. A side-effect of creating a filebacked object is not only the filebacking(s), but a descriptor file (in the same directory) that can be used for subsequent attachments (see attach.big.matrix).

See Also

bigmemory, and perhaps the class documentation of big.matrix; attach.big.matrix and describe.

Examples

Run this code
x <- big.matrix(10, 2, type='integer', init=-5)
colnames(x) = c("alpha", "beta")
is.big.matrix(x)
dim(x)
colnames(x)
rownames(x)
x[,]
x[1:8,1] <- 11:18
x[,]
colmin(x)
colmax(x)
colrange(x)
colsum(x)
colprod(x)
colmean(x)
colvar(x)
summary(x)

x <- as.big.matrix(matrix(-5, 10, 2))
colnames(x) <- c("alpha", "beta")
is.big.matrix(x)
dim(x)
colnames(x)
rownames(x)
x[1:8,1] <- 11:18
x[,]

# The following shared memory example is quite silly, as you wouldn't likely do
# this in a single R session.  But if zdescription were passed to another R session
# via SNOW, NetWorkSpaces, or even by a simple file read/write,
# then the attach.big.matrix() within the second R process would give access to the
# same object in memory.  Please see the package vignette for real examples.
z <- shared.big.matrix(3, 3, type='integer', init=3)
z[,]
dim(z)
z[1,1] <- 2
z[,]
zdescription <- describe(z)
zdescription
y <- attach.big.matrix(zdescription)
y[,]
y
z
y[1,1] <- -100
y[,]
z[,]

# A short filebacked example, showing the creation of associated files and mutexes:
files <- dir()
files[grep("example.bin", files)]
z <- filebacked.big.matrix(3, 3, type='integer', init=123, backingfile="example.bin", dimnames=list( c('a','b','c'), c('d', 'e', 'f')))
z[,]
files <- dir()
files[grep("example.bin", files)]

Run the code above in your browser using DataLab