Learn R Programming

stm (version 1.1.3)

convertCorpus: Convert stm formatted documents to another format

Description

Takes an stm formatted documents and vocab object and returns formats useable in other packages.

Usage

convertCorpus(documents, vocab, type=c("slam", "lda", "Matrix"))

Arguments

documents
the documents object in stm format
vocab
the vocab object in stm format
type
the output type desired. See Details.

Details

The various type conversions are described below:
type="slam"
Converts to the simple triplet matrix representation used by the slam package. This is the format used internally by tm.

type="lda"
Converts to the format used by the lda package. This is a very minor change as the format in stm is based on lda's data representation. The difference as noted in stm involves how the numbers are indexed. Accordingly this type returns a list containing the new documents object and the unchanged vocab object.

type="Matrix"
Converts to the sparse matrix representation used by Matrix. This is the format used internally by numerous other text analysis packages.

If you want to write out a file containing the sparse matrix representation popularized by David Blei's C code ldac see the function writeLdac.

See Also

writeLdac readCorpus poliblog5k

Examples

Run this code
#convert the poliblog5k data to slam package format
poliSlam <- convertCorpus(poliblog5k.docs, poliblog5k.voc, type="slam")
class(poliSlam)

Run the code above in your browser using DataLab