cluster.evaluation: Clustering Evaluation Index Based on Known Ground Truth

Description

Computes the similarity between the true cluster solution and the one obtained with a method under evaluation.

Usage

cluster.evaluation(G, S)

Arguments

Integer vector with the labels of the true cluster solution. Each element of the vector specifies the cluster 'id' that the element belongs to.

Integer vector with the labels of the cluster solution to be evaluated. Each element of the vector specifies the cluster 'id' that the element belongs to.

Value

The computed index.

Details

The measure of clustering evaluation is defined as $$ Sim(G,C) = 1/k \sum_{i=1}^k \max_{1\leq j\leq k} Sim(G_i,C_j), $$ where $$Sim(G_i, C_j) = \frac{ 2 | G_i \cap C_j|}{ |G_i| + |C_j|}$$

with |.| denoting the cardinality of the elements in the set. This measure has been used for comparing different clusterings, e.g. in Kalpakis et al. (2001) and P<U+00E9>rtega and Vilar (2010).

References

Larsen, B. and Aone, C. (1999) Fast and effective text mining using linear-time document clustering. Proc. KDD' 99.16--22.

Kalpakis, K., Gada D. and Puttagunta, V. (2001) Distance measures for effective clustering of arima time-series. Proceedings 2001 IEEE International Conference on Data Mining, 273--280.

P<U+00E9>rtega S. and Vilar, J.A (2010) Comparing several parametric and nonparametric approaches to time series clustering: A simulation study. J. Classification, 27(3), 333-362.

Montero, P and Vilar, J.A. (2014) TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. http://www.jstatsoft.org/v62/i01/.

Examples

Run this code

# NOT RUN {
 #create a true cluster 
 #(first 4 elements belong to cluster '1', next 4 to cluster '2' and the last 4 to cluster '3'.
 true_cluster <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
 #the cluster to be tested
 new_cluster <- c( 2, 1, 2, 3, 3, 2, 2, 1, 3, 3, 3, 3)
 
 #get the index
 cluster.evaluation(true_cluster, new_cluster)
 
 #it can be seen that the index is not simmetric
 cluster.evaluation(new_cluster, true_cluster)
# }

Run the code above in your browser using DataLab