Last chance! 50% off unlimited learning
Sale ends in
Convert a quanteda dfm object to a format useable by other text
analysis packages. The general function convert
provides easy
conversion from a dfm to the document-term representations used in all other
text analysis packages for which conversions are defined.
convert(x, to = c("lda", "tm", "stm", "austin", "topicmodels", "lsa",
"matrix", "data.frame", "tripletlist"), docvars = NULL)
a dfm to be converted
target conversion format, consisting of the name of the package into whose document-term matrix representation the dfm will be converted:
"lda"
a list with components "documents" and "vocab" as needed by the function lda.collapsed.gibbs.sampler from the lda package
"tm"
a DocumentTermMatrix from the tm package
"stm"
the format for the stm package
"austin"
the wfm
format from the
austin package
"topicmodels"
the "dtm" format as used by the topicmodels package
"lsa"
the "textmatrix" format as used by the lsa package
"data.frame"
a data.frame where each feature is a variable
"tripletlist"
a named "triplet" format list consisting of
document
, feature
, and frequency
optional data.frame of document variables used as the
meta
information in conversion to the stm package format.
This aids in selecting the document variables only corresponding to the
documents with non-zero counts.
A converted object determined by the value of to
(see above).
See conversion target package documentation for more detailed descriptions
of the return formats.
# NOT RUN {
mycorpus <- corpus_subset(data_corpus_inaugural, Year > 1970)
quantdfm <- dfm(mycorpus, verbose = FALSE)
# austin's wfm format
identical(dim(quantdfm), dim(convert(quantdfm, to = "austin")))
# stm package format
stmdfm <- convert(quantdfm, to = "stm")
str(stmdfm)
#' # triplet
triplet <- convert(quantdfm, to = "tripletlist")
str(triplet)
# illustrate what happens with zero-length documents
quantdfm2 <- dfm(c(punctOnly = "!!!", mycorpus[-1]), verbose = FALSE)
rowSums(quantdfm2)
stmdfm2 <- convert(quantdfm2, to = "stm", docvars = docvars(mycorpus))
str(stmdfm2)
# }
# NOT RUN {
# tm's DocumentTermMatrix format
tmdfm <- convert(quantdfm, to = "tm")
str(tmdfm)
# topicmodels package format
str(convert(quantdfm, to = "topicmodels"))
# lda package format
ldadfm <- convert(quantdfm, to = "lda")
str(ldadfm)
# }
Run the code above in your browser using DataLab