Technically, Ward's minimum variance method is used with a Chi-squared distance: see
hclust
for details about the clustering process.
The first slider allows skipping less significant terms to use less memory with large corpora. The second allows choosing what dimensions of the correspondence analysis should be used, which helps removing noise to concentrate on identified caracteristics of the corpus.
Since the clustering by itself only returns a tree, cutting it at a given size is needed to create classes of documents: this is offered automatically after the dendrogram has been computed, and can be achieved as many times as needed thanks to the Text Mining->Hierarchical clustering->Create clusters... dialog.
hclust
, dist
, corpusCaDlg
, removeSparseTerms
,
DocumentTermMatrix
, createClustersDlg