termFreq(doc, control = list())
TextDocument
. First, following two options are processed.
[object Object],[object Object]
Next, a set of options which are sensitive to the order of
occurrence in the control
term_frequency
with term
frequencies as values and tokens as names.getTokenizers
data("crude")
termFreq(crude[[14]])
strsplit_space_tokenizer <- function(x) unlist(strsplit(x, "[[:space:]]+"))
ctrl <- list(tokenize = strsplit_space_tokenizer,
removePunctuation = list(preserve_intra_word_dashes = TRUE),
stopwords = c("reuter", "that"),
stemming = TRUE,
wordLengths = c(4, Inf))
termFreq(crude[[14]], control = ctrl)
Run the code above in your browser using DataLab