Learn R Programming

tosca (version 0.3-2)

clusterTopics: Cluster Analysis

Description

This function makes a cluster analysis using the Hellinger distance.

Usage

clusterTopics(
  ldaresult,
  file,
  tnames = NULL,
  method = "average",
  width = 30,
  height = 15,
  ...
)

Value

A dendogram as pdf and a list containing

dist

A distance matrix

clust

The result from hclust

Arguments

ldaresult

The result of a function call LDAgen - alternatively the corresponding matrix result$topics

file

File for the dendogram pdf.

tnames

Character vector as label for the topics.

method

Method statement from hclust

width

Grafical parameter for pdf output. See pdf

height

Grafical parameter for pdf output. See pdf

...

Additional parameter for plot

Details

This function is useful to analyze topic similarities and while evaluating the right number of topics of LDAs.

Examples

Run this code

texts <- list(A="Give a Man a Fish, and You Feed Him for a Day.
Teach a Man To Fish, and You Feed Him for a Lifetime",
B="So Long, and Thanks for All the Fish",
C="A very able manipulative mathematician, Fisher enjoys a real mastery
in evaluating complicated multiple integrals.")

corpus <- textmeta(meta=data.frame(id=c("A", "B", "C", "D"),
title=c("Fishing", "Don't panic!", "Sir Ronald", "Berlin"),
date=c("1885-01-02", "1979-03-04", "1951-05-06", "1967-06-02"),
additionalVariable=1:4, stringsAsFactors=FALSE), text=texts)

corpus <- cleanTexts(corpus)
wordlist <- makeWordlist(corpus$text)
ldaPrep <- LDAprep(text=corpus$text, vocab=wordlist$words)

LDA <- LDAgen(documents=ldaPrep, K = 3L, vocab=wordlist$words, num.words=3)
clusterTopics(ldaresult=LDA)

Run the code above in your browser using DataLab