slam (version 0.1-50)

foreign: Read and Write Sparse Matrix Format Files

Description

Read and write CLUTO sparse matrix format files, or the CCS format variant employed by the MC toolkit.

Usage

read_stm_CLUTO(file)
write_stm_CLUTO(x, file)
read_stm_MC(file, scalingtype = NULL)
write_stm_MC(x, file)

Arguments

file

a character string with the name of the file to read or write.

x

a matrix object.

scalingtype

a character string specifying the type of scaling to be used, or NULL (default), in which case the scaling will be inferred from the names of the files with non-zero entries found (see Details).

Details

Documentation for CLUTO including its sparse matrix format is available from https://www-users.cs.umn.edu/~karypis/cluto/.

read_stm_CLUTO reads CLUTO sparse matrices, returning a simple triplet matrix.

write_stm_CLUTO writes CLUTO sparse matrices. Argument x must be coercible to a simple triplet matrix via as.simple_triplet_matrix.

MC is a toolkit for creating vector models from text documents (see https://www.cs.utexas.edu/users/dml/software/mc/). It employs a variant of Compressed Column Storage (CCS) sparse matrix format, writing data into several files with suitable names: e.g., a file with _dim appended to the base file name stores the matrix dimensions. The non-zero entries are stored in a file the name of which indicates the scaling type used: e.g., _tfx_nz indicates scaling by term frequency (t), inverse document frequency (f) and no normalization (x). See README in the MC sources for more information.

read_stm_MC reads such sparse matrix information with argument file giving the path with the base file name, and returns a simple triplet matrix.

write_stm_MC writes matrices in MC CCS sparse matrix format. Argument x must be coercible to a simple triplet matrix via as.simple_triplet_matrix.