To make the usage as consistent as possible with other packages, quanteda
also provides shortcut wrappers to convert
, designed to be
similar in syntax to analogous commands in the packages to whose format they
are converting.
as.wfm(x)# S3 method for dfm
as.wfm(x)
as.DocumentTermMatrix(x)
# S3 method for dfm
as.DocumentTermMatrix(x)
dfm2austin(x)
dfm2tm(x, weighting = tm::weightTf)
dfm2lda(x, omit_empty = TRUE)
dtm2lda(x, omit_empty = TRUE)
dfm2dtm(x, omit_empty = TRUE)
dfm2stm(x, docvars = NULL, omit_empty = TRUE)
the dfm to be converted
a tm weight, see weightTf
logical; if TRUE
, omit empty documents and features
from the converted dfm. This is required for some formats (such as STM)
that do not accept empty documents. Only used when to = "lda"
or
to = "topicmodels"
. For to = "stm"
format, `omit_empty`` is
always TRUE
.
optional data.frame of document variables used as the
meta
information in conversion to the stm package format.
This aids in selecting the document variables only corresponding to the
documents with non-zero counts. Only affects the "stm" format.
additional arguments used only by as.DocumentTermMatrix
A converted object determined by the value of to
(see above).
See conversion target package documentation for more detailed descriptions
of the return formats.
as.wfm
converts a quanteda dfm into the
wfm
format used by the austin
package.
as.DocumentTermMatrix
will convert a quanteda dfm into
the tm package's DocumentTermMatrix format. Note: The
tm package version of as.TermDocumentMatrix
allows a
weighting
argument, which supplies a weighting function for
TermDocumentMatrix. Here the default is for term frequency
weighting. If you want a different weighting, apply the weights after
converting using one of the tm functions. For other available
weighting functions from the tm package, see
TermDocumentMatrix
.
dfm2lda
provides converts a dfm into the list representation
of terms in documents used by the lda package (a list with components
"documents" and "vocab" as needed by
lda.collapsed.gibbs.sampler
).
dfm2ldaformat
provides converts a dfm into the list
representation of terms in documents used by the lda package (a list
with components "documents" and "vocab" as needed by
lda.collapsed.gibbs.sampler
).
# NOT RUN {
corp <- corpus_subset(data_corpus_inaugural, Year > 1970)
dfmat <- dfm(corp)
# shortcut conversion to austin package's wfm format
identical(as.wfm(dfmat), convert(dfmat, to = "austin"))
# }
# NOT RUN {
# shortcut conversion to tm package's DocumentTermMatrix format
identical(as.DocumentTermMatrix(dfmat), convert(dfmat, to = "tm"))
# }
# NOT RUN {
# }
# NOT RUN {
# shortcut conversion to lda package list format
identical(quanteda:::dfm2lda(dfmat), convert(dfmat, to = "lda"))
# }
# NOT RUN {
# }
# NOT RUN {
# shortcut conversion to lda package list format
identical(dfm2ldaformat(dfmat), convert(dfmat, to = "lda"))
# }
Run the code above in your browser using DataLab