Calculates the similarity of all pairwise topic combinations using the rank-biased overlap (RBO) Similarity.
rboTopics(topics, k, p, progress = TRUE, pm.backend, ncpus)[named list] with entries
sims[lower triangular named matrix] with all pairwise
similarities of the given topics.
wordslimit[integer] = vocabulary size. See
jaccardTopics for original purpose.
wordsconsidered[integer] = vocabulary size. See
jaccardTopics for original purpose.
param[named list] with parameter
type [character(1)] = "RBO Similarity",
k [integer(1)] and p [0,1]. See above for explanation.
[named matrix]
The counts of vocabularies/words (row wise) in topics (column wise).
[integer(1)]
Maximum depth for evaluation. Words down to this rank are considered for the calculation of similarities.
[0,1]
Weighting parameter. Lower values emphasizes top ranked words while values
that go towards 1 correspond to equal weights for each evaluation depth.
[logical(1)]
Should a nice progress bar be shown? Turning it off, could lead to significantly
faster calculation. Default is TRUE.
If pm.backend is set, parallelization is done and no progress bar will be shown.
[character(1)]
One of "multicore", "socket" or "mpi".
If pm.backend is set, parallelStart is
called before computation is started and parallelStop
is called after.
[integer(1)]
Number of (physical) CPUs to use. If pm.backend is passed,
default is determined by availableCores.
The RBO Similarity for two topics \(\bm z_{i}\) and \(\bm z_{j}\) is calculated by $$RBO(\bm z_{i}, \bm z_{j} \mid k, p) = 2p^k\frac{\left|Z_{i}^{(k)} \cap Z_{j}^{(k)}\right|}{\left|Z_{i}^{(k)}\right| + \left|Z_{j}^{(k)}\right|} + \frac{1-p}{p} \sum_{d=1}^k 2 p^d\frac{\left|Z_{i}^{(d)} \cap Z_{j}^{(d)}\right|}{\left|Z_{i}^{(d)}\right| + \left|Z_{j}^{(d)}\right|}$$ with \(Z_{i}^{(d)}\) is the vocabulary set of topic \(\bm z_{i}\) down to rank \(d\). Ties in ranks are resolved by taking the minimum.
The value wordsconsidered describes the number of words per topic
ranked at rank \(k\) or above.
Webber, William, Alistair Moffat and Justin Zobel (2010). "A similarity measure for indefinite rankings". In: ACM Transations on Information Systems 28(4), p.20:1–-20:38, tools:::Rd_expr_doi("10.1145/1852102.1852106").
Other TopicSimilarity functions:
cosineTopics(),
dendTopics(),
getSimilarity(),
jaccardTopics(),
jsTopics()
res = LDARep(docs = reuters_docs, vocab = reuters_vocab, n = 4, K = 10, num.iterations = 30)
topics = mergeTopics(res, vocab = reuters_vocab)
rbo = rboTopics(topics, k = 12, p = 0.9)
rbo
sim = getSimilarity(rbo)
dim(sim)
Run the code above in your browser using DataLab