Computes the V-measure between two clusterings, such
as a predicted and ground truth clustering.
Usage
v_measure(true, pred, beta = 1)
Arguments
true
ground truth clustering represented as a membership
vector. Each entry corresponds to an element and the value identifies
the assigned cluster. The specific values of the cluster identifiers
are arbitrary.
pred
predicted clustering represented as a membership
vector.
beta
non-negative weight. A value of 0 assigns no weight to
completeness (i.e. the measure reduces to homogeneity), while larger
values assign increasing weight to completeness. A value of 1 weights
completeness and homogeneity equally.
Details
V-measure is defined as the \(\beta\)-weighted harmonic
mean of homogeneity \(h\) and completeness \(c\):
$$(1 + \beta)\frac{h \cdot c}{\beta \cdot h + c}.$$
The range of V-measure is between 0 and 1, where 1 corresponds to a
perfect match between the clusterings. It is equivalent to the
normalised mutual information, when the aggregation function is the
arithmetic mean.
References
Rosenberg, A. and Hirschberg, J. "V-measure: A conditional entropy-based external cluster evaluation measure." Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), (2007).
Becker, H. "Identification and characterization of events in social media."
PhD dissertation, Columbia University, (2011).
See Also
homogeneity and completeness evaluate the component
measures upon which this measure is based.