Create a Chinese term-document matrix or a document-term matrix.
createDTM(string, language = c("zh", "en"), tokenize = NULL, removePunctuation = TRUE,
removeNumbers = TRUE, removeStopwords = TRUE)
createTDM(string, language = c("zh", "en"), tokenize = NULL, removePunctuation = TRUE,
removeNumbers = TRUE, removeStopwords = TRUE)
A character vector.
The language type, 'zh' means Chinese.
A tokenizers function.
Whether to remove the punctuations.
Whether to remove the numbers.
Whether to remove the stop words.
An object of class TermDocumentMatrix
or class DocumentTermMatrix
.
Package "tm" is required.