read.matrix: Read a sparse matrix from a file and return a matrix

Description

Reading matrices from files can be time consuming depending on the size of the matrix. 'read.matrix' implements a fairly efficient routine to read in sparse matrices and returning dense matrix counterparts.

Usage

read.matrix(file, header = FALSE, skip = 1, row.ids = NULL, col.ids = NULL,  colClasses = c('character','character','numeric'), assign.fn = assignMatrixDense, filter.fn = NULL, ...)

Arguments

file

A file or connection to read from

header

Whether header lines exist defining all possible rows and columns. If this is false, then the defined triplet elements will produce the complete set of rows and columns.

skip

The number of rows to skip. This assumes there is a single header line, which is skipped.

row.ids

If header is TRUE, the row number that defines the row.ids. If header == FALSE, the row.ids to use for the matrix

col.ids

If header is TRUE, the col number that defines the col.ids. If header == FALSE, the col.ids to use for the matrix

colClasses

The classes to use for the columns in the triplet file

assign.fn

The function to use to construct the sparse representation that is then converted to a dense matrix

filter.fn

An optional function used to filter/clean the input data and/or row/column ids. The signature of filter.fn must have arguments for data, row.ids, and col.ids

...

Additional arguments to pass to the construction portion of the implementation

Value

A matrix object generated from sparse triplet data.

Details

Matrices that have dimensions on the order of thousands can be slow to load into R. 'read.matrix' provides an efficient implementation for reading sparse matrices in triplet form from a file or other connection. This version removes dependencies from other packages and shows a speed improvement over those methods.

The primary benefit of this function is that named rows and columns can be used as opposed to integer indexes, as compared to the slam package. The other main motivation is that if the memory is available, dense matrix calculations can be faster than their sparse counterparts, not to mention having a wider range of operators available.

When header == TRUE, the row names and/or column names are read from the file. The names are expected to be comma separated in a single line.

Various methods can be used to construct a sparse matrix representation that is used as the basis for constructing the dense matrix. Currently only the assignMatrixDense function is available, which works well for matrices in triplet form.

Examples

Run this code

## Not run: 
#   path <- system.file('sample-data/triplet.csv', package='futile.matrix')
#   m <- read.matrix(path)
# 
#   rows <- paste('row', 1:10000, sep='.')
#   cols <- paste('col', 1:10000, sep='.')
#   n <- read.matrix(path, row.ids=rows, col.ids=cols)
# ## End(Not run)

Run the code above in your browser using DataLab