gofastr (version 0.3.0)

q_tdm: Quick TermDocumentMatrix

Description

Make a TermDocumentMatrix from a vector of text and and optional vector of documents. To stem a document as well use the q_tdm_stem version of q_tdm which uses SnowballC's wordStem.

Usage

q_tdm(text, docs = seq_along(text), to = "tm", keep.hyphen = FALSE,
  ngrams = NULL, ...)

q_tdm_stem(text, docs = seq_along(text), to = "tm", keep.hyphen = FALSE, ngrams = NULL, ...)

Arguments

text

A vector of strings.

docs

A vector of document names.

to

target conversion format, consisting of the name of the package into whose document-term matrix representation the dfm will be converted:

"lda"

a list with components "documents" and "vocab" as needed by lda.collapsed.gibbs.sampler from the lda package

"tm"

a DocumentTermMatrix from the tm package

"stm"

the format for the stm package

"austin"

the wfm format from the austin package

"topicmodels"

the "dtm" format as used by the topicmodels package

keep.hyphen

logical. If TRUE hyphens are retained in the terms (e.g., "math-like" is kept as "math-like"), otherwise they become a split for terms (e.g., "math-like" is converted to "math" & "like").

ngrams

A vector of ngrams (multiple wrds with spaces). Using this option results in the ngrams that will be retained in the matrix.

Additional arguments passed to dfm

Examples

Run this code
# NOT RUN {
(x <- with(presidential_debates_2012, q_tdm(dialogue, paste(time, tot, sep = "_"))))
tm::weightTfIdf(x)

(x2 <- with(presidential_debates_2012, q_tdm_stem(dialogue, paste(time, tot, sep = "_"))))
remove_stopwords(x2, stem=TRUE)
# }

Run the code above in your browser using DataLab