ldaformat2dtm
From topicmodels v0.2-4
by Bettina Gruen
Transform data from and for use with the lda package
Data from the lda package is transformed to a document-term matrix. This data format can be used to fit topic models using package topicmodels.
Data in form of a document-term matrix is transformed to the LDA format used by package lda.
- Keywords
- utilities
Usage
ldaformat2dtm(documents, vocab, omit_empty = TRUE)
dtm2ldaformat(x, omit_empty = TRUE)
Arguments
- documents
- A
list
where each entry corresponds to a document; for each document the number of terms occurring in the document are stored in amatrix
with two rows such that in each column the first entry corresponds to the vocabulary id of the term and the second entry to the number of times this term occurred in the document. - vocab
- A
"character"
vector of the terms in the vocabulary. - x
- An object of class
"DocumentTermMatrix"
as defined in package tm. - omit_empty
- A logical indicating if empty documents should be removed when converting the objects. By default empty documents are removed.
Value
-
An object of class
"DocumentTermMatrix"
is returned by
ldaformat2dtm()
and a list with components "documents"
and "vocab"
by dtm2ldaformat()
.
Examples
if (require("lda")) {
data("cora.documents", package = "lda")
data("cora.vocab", package = "lda")
dtm <- ldaformat2dtm(cora.documents, cora.vocab)
cora <- dtm2ldaformat(dtm)
all.equal(cora, list(documents = cora.documents,
vocab = cora.vocab))
}
Community examples
Looks like there are no examples yet.