Computes various coherence based metrics for topic models. It
assesses the quality of estimated topics based on co-occurrences of words.
For best results, consider cleaning the initial tokens object with padding = TRUE
.
coherence(
x,
nWords = 10,
method = c("C_NPMI", "C_V"),
window = NULL,
NPMIs = NULL
)
A vector or matrix containing the coherence score of each topic.
a model created from the LDA()
, JST()
or rJST()
function and
estimated with fit()
the number of words in each topic used for evaluation.
the coherence method used.
optional. If NULL
, use the default window for each coherence
metric (10 for C_NPMI and 110 for C_V). It is possible to override these
default windows by providing an integer or "boolean"
to this argument,
determining a new window size for all measures. No effect is the NPMIs
argument is also provided.
optional NPMI matrix. If provided, skip the computation of NPMI between words, substantially decreasing computing time.
Olivier Delmarcelle
Currently, only C_NPMI and C_V are documented. The implementation follows Röder & al. (2015). For C_NPMI, the sliding window is 10 whereas it is 110 for C_V.
Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the Space of Topic Coherence Measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 399-–408.