mallet (version 1.3.0)

mallet.doc.topics: Retrieve a matrix of topic weights for every document

Description

This function returns a matrix with one row for every document and one column for every topic.

Usage

mallet.doc.topics(topic.model, normalized = FALSE, smoothed = FALSE)

Value

a number of documents by number of topics matrix.

Arguments

topic.model

A cc.mallet.topics.RTopicModel object created by MalletLDA.

normalized

If TRUE, normalize the rows so that each document sums to one. If FALSE, values will be integers (possibly plus the smoothing constant) representing the actual number of words of each topic in the documents.

smoothed

If TRUE, add the smoothing parameter for the model (initial value specified as alpha.sum in MalletLDA). If FALSE, many values will be zero.

Examples

Run this code
if (FALSE) {
# Read in sotu example data
data(sotu)
sotu.instances <-
   mallet.import(id.array = row.names(sotu),
                 text.array = sotu[["text"]],
                 stoplist = mallet_stoplist_file_path("en"),
                 token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}")

# Create topic model
topic.model <- MalletLDA(num.topics=10, alpha.sum = 1, beta = 0.1)
topic.model$loadDocuments(sotu.instances)

# Train topic model
topic.model$train(200)

# Extract results
doc_topics <- mallet.doc.topics(topic.model, smoothed=TRUE, normalized=TRUE)
topic_words <- mallet.topic.words(topic.model, smoothed=TRUE, normalized=TRUE)
top_words <- mallet.top.words(topic.model, word.weights = topic_words[2,], num.top.words = 5)
}


Run the code above in your browser using DataLab