coherence: Coherence of estimated topics

Description

Computes various coherence based metrics for topic models. It assesses the quality of estimated topics based on co-occurrences of words. For best results, consider cleaning the initial tokens object with padding = TRUE.

Usage

coherence(
  x,
  nWords = 10,
  method = c("C_NPMI", "C_V"),
  window = NULL,
  NPMIs = NULL
)

Value

A vector or matrix containing the coherence score of each topic.

Arguments

x: a model created from the LDA(), JST() or rJST() function and estimated with fit()
nWords: the number of words in each topic used for evaluation.
method: the coherence method used.
window: optional. If NULL, use the default window for each coherence metric (10 for C_NPMI and 110 for C_V). It is possible to override these default windows by providing an integer or "boolean" to this argument, determining a new window size for all measures. No effect is the NPMIs argument is also provided.
NPMIs: optional NPMI matrix. If provided, skip the computation of NPMI between words, substantially decreasing computing time.

Author

Olivier Delmarcelle

Details

Currently, only C_NPMI and C_V are documented. The implementation follows Röder & al. (2015). For C_NPMI, the sliding window is 10 whereas it is 110 for C_V.

References

Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the Space of Topic Coherence Measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 399-–408.