mallet (version 1.3.0)

mallet.topic.hclust: Return a hierarchical clustering of topics

Description

Returns a hierarchical clustering of topics that can be plotted as a dendrogram. There are two ways of measuring topic similarity: topics may contain the some of the same words, or the may appear in some of the same documents. The balance parameter allows you to interpolate between the similarities determined by these two methods.

Usage

mallet.topic.hclust(
  doc.topics,
  topic.words,
  balance = 0.3,
  method = "euclidean",
  ...
)

Value

An object of class hclust which describes the tree produced by the clustering process.

Arguments

doc.topics

A documents by topics matrix of topic probabilities (see mallet.doc.topics).

topic.words

A topics by words matrix of word probabilities (see mallet.topic.words) .

balance

A value between 0.0 (use only document-level similarity) and 1.0 (use only word-level similarity).

method

method to use in dist to compute distance between topics. Defaults to euclidian.

...

Further arguments for hclust.

See Also

This function uses data matrices from mallet.doc.topics and mallet.topic.words using the hclust function.

Examples

Run this code
if (FALSE) {
# Read in sotu example data
data(sotu)
sotu.instances <-
   mallet.import(id.array = row.names(sotu),
                 text.array = sotu[["text"]],
                 stoplist = mallet_stoplist_file_path("en"),
                 token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}")

# Create topic model
topic.model <- MalletLDA(num.topics=10, alpha.sum = 1, beta = 0.1)
topic.model$loadDocuments(sotu.instances)

# Train topic model
topic.model$train(200)

# Create hiearchical clusters of topics
doc_topics <- mallet.doc.topics(topic.model, smoothed=TRUE, normalized=TRUE)
topic_words <- mallet.topic.words(topic.model, smoothed=TRUE, normalized=TRUE)
topic_labels <- mallet.topic.labels(topic.model)
plot(mallet.topic.hclust(doc_topics, topic_words, balance = 0.3), labels=topic_labels)
}

Run the code above in your browser using DataLab