mallet (version 1.3.0)

mallet.word.freqs: Descriptive statistics of word frequencies

Description

This method returns a data frame with one row for each unique vocabulary word, and three columns: the word as a character value, the total number of tokens of that word type, and the total number of documents that contain that word at least once. This information can be useful in identifying candidate stopwords.

Usage

mallet.word.freqs(topic.model)

Value

a data.frame with the word type (word), the word frequency (word.freq), and the document frequency (doc.freq)

Arguments

topic.model

A cc.mallet.topics.RTopicModel object created by MalletLDA.

See Also

MalletLDA

Examples

Run this code
if (FALSE) {
# Read in sotu example data
data(sotu)
sotu.instances <-
   mallet.import(id.array = row.names(sotu),
                 text.array = sotu[["text"]],
                 stoplist = mallet_stoplist_file_path("en"),
                 token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}")

# Create topic model
topic.model <- MalletLDA(num.topics=10, alpha.sum = 1, beta = 0.1)
topic.model$loadDocuments(sotu.instances)

# Get word frequencies
word_freqs <- mallet.word.freqs(topic.model)

}

Run the code above in your browser using DataLab