gofastr (version 0.3.0)

remove_stopwords: Remove Stopwords from a TermDocumentMatrix/DocumentTermMatrix

Description

remove_stopwords - Remove stopwords and < nchar words from a TermDocumentMatrix or DocumentTermMatrix.

prep_stopwords - Join multiple vectors of words, convert to lower case, and return sorted unique words.

Usage

remove_stopwords(x, stopwords = tm::stopwords("english"), min.char = 3,
  max.char = NULL, stem = FALSE, denumber = TRUE)

prep_stopwords(...)

Arguments

stopwords

A vector of stopwords to remove.

min.char

The minimal length character for retained words.

max.char

The maximum length character for retained words.

stem

Logical. If TRUE the stopwords will be stemmed.

denumber

Logical. If TRUE numbers will be excluded.

vectors of words.

Value

Returns a TermDocumentMatrix or DocumentTermMatrix.

Examples

Run this code
# NOT RUN {
(x <-with(presidential_debates_2012, q_dtm(dialogue, paste(time, tot, sep = "_"))))
remove_stopwords(x)
(y <- with(presidential_debates_2012, q_tdm(dialogue, paste(time, tot, sep = "_"))))
remove_stopwords(y)

prep_stopwords("the", "ChIcken", "Hello", tm::stopwords("english"), c("John", "Josh"))
# }

Run the code above in your browser using DataCamp Workspace