variation_info: Variation of Information Between Clusterings
Description
Computes the variation of information between two
clusterings, such as a predicted and ground truth clustering.
Usage
variation_info(true, pred, base = exp(1))
Arguments
true
ground truth clustering represented as a membership
vector. Each entry corresponds to an element and the value identifies
the assigned cluster. The specific values of the cluster identifiers
are arbitrary.
pred
predicted clustering represented as a membership
vector.
base
base of the logarithm. Defaults to exp(1).
Details
Variation of information is an entropy-based distance metric
on the space of clusterings. It is unnormalized and varies between
\(0\) and \(\log(N)\) where \(N\) is the number of
clustered elements. Larger values of the distance metric correspond
to greater dissimilarity between the clusterings.
References
Arabie, P. and Boorman, S. A. "Multidimensional scaling of measures of
distance between partitions." Journal of Mathematical Psychology10:2,
148-203, (1973). DOI: 10.1016/0022-2496(73)90012-6.
Meilă, M. "Comparing Clusterings by the Variation of Information." In:
Learning Theory and Kernel Machines, Lecture Notes in Computer Science
2777, Springer, Berlin, Heidelberg, (2003). DOI:
10.1007/978-3-540-45167-9_14.