LongVectors: Long Vectors

Description

Vectors of $2^31$ or more elements were added in R 3.0.0.

Arguments

Matrix algebra

It is now possible to use $m x n$ matrices with more than 2 billion elements. Whether matrix algebra (including %*%, crossprod, svd, qr, solve and eigen will actually work is somewhat implementation dependent, including the Fortran compiler used and if an external BLAS or LAPACK is used. An efficient parallel BLAS implementation will often be important to obtain usable performance. For example on one particular platform chol on a 47,000 square matrix took about 5 hours with the internal BLAS, 21 minutes using an optimized BLAS on one core, and 2 minutes using an optimized BLAS on 16 cores.

Details

Prior to R 3.0.0, all vectors in R were restricted to at most $2^31 - 1$ elements and could be indexed by integer vectors.

Currently all atomic (raw, logical, integer, numeric, complex, character) vectors, lists and expressions can be much longer on 64-bit platforms: such vectors are referred to as ‘long vectors’ and have a slightly different internal structure. In theory up they can to $2^52$ elements, but address space limits of current CPUs and OSes will be much smaller. Such objects will have a length that is expressed as a double, and can be indexed by double vectors.

Arrays (including matrices) can be based on long vectors provided each of their dimensions is at most $2^31 - 1$: thus there are no 1-dimensional long arrays.

R code typically only needs minor changes to work with long vectors, maybe only checking that as.integer is not used unnecessarily for e.g. lengths. However, compiled code typically needs quite extensive changes. Note that the .C and .Fortran interfaces do not accept long vectors, so .Call (or similar) has to be used.

Because of the storage requirements (a minimum of 64 bytes per character string), character vectors are only going to be usable if they have a small number of distinct elements, and even then factors will be more efficient (4 bytes per element rather than 8). So it is expected that most of the usage of long vectors will be integer vectors (including factors) and numeric vectors.