tm (version 0.5-10)

termFreq: Term Frequency Vector

Description

Generate a term frequency vector from a text document.

Usage

termFreq(doc, control = list())

Arguments

doc
An object inheriting from TextDocument.
control
A list of control options which override default settings.

First, following two options are processed. [object Object],[object Object] Next, a set of options which are sensitive to the order of occurrence in the control

Value

  • A named integer vector of class term_frequency with term frequencies as values and tokens as names.

See Also

getTokenizers

Examples

Run this code
data("crude")
termFreq(crude[[14]])
strsplit_space_tokenizer <- function(x) unlist(strsplit(x, "[[:space:]]+"))
ctrl <- list(tokenize = strsplit_space_tokenizer,
             removePunctuation = list(preserve_intra_word_dashes = TRUE),
             stopwords = c("reuter", "that"),
             stemming = TRUE,
             wordLengths = c(4, Inf))
termFreq(crude[[14]], control = ctrl)

Run the code above in your browser using DataLab