fpc (version 2.2-9)

valstat.object: Cluster validation statistics - object


The objects of class "valstat" store cluster validation statistics from various clustering methods run with various numbers of clusters.



A legitimate valstat object is a list. The format of the list relies on the number of involved clustering methods, nmethods, say, i.e., the length of the method-component explained below. The first nmethods elements of the valstat-list are just numbered. These are themselves lists that are numbered between 1 and the maxG-component defined below. Element [[i]][[j]] refers to the clustering from clustering method number i with number of clusters j. Every such element is a list with components avewithin, mnnd, cvnnd, maxdiameter, widestgap, sindex, minsep, asw, dindex, denscut, highdgap, pearsongamma, withinss, entropy: Further optional components are pamc, kdnorm, kdunif, dmode, aggregated. All these are cluster validation indexes, as follows.


average distance within clusters (reweighted so that every observation, rather than every distance, has the same weight).


average distance to nnkth nearest neighbour within cluster. (nnk is a parameter of cqcluster.stats, default 2.)


coefficient of variation of dissimilarities to nnkth nearest wthin-cluster neighbour, measuring uniformity of within-cluster densities, weighted over all clusters, see Sec. 3.7 of Hennig (2019). (nnk is a parameter of cqcluster.stats, default 2.)


maximum cluster diameter.


widest within-cluster gap or average of cluster-wise widest within-cluster gap, depending on parameter averagegap of cqcluster.stats, default FALSE.


separation index. Defined based on the distances for every point to the closest point not in the same cluster. The separation index is then the mean of the smallest proportion sepprob (parameter of cqcluster.stats, default 0.1) of these. See Hennig (2019).


minimum cluster separation.


average silhouette width. See silhouette.


this index measures to what extent the density decreases from the cluster mode to the outskirts; I-densdec in Sec. 3.6 of Hennig (2019); low values are good.


this index measures whether cluster boundaries run through density valleys; I-densbound in Sec. 3.6 of Hennig (2019); low values are good.


this measures whether there is a large within-cluster gap with high density on both sides; I-highdgap in Sec. 3.6 of Hennig (2019); low values are good.


correlation between distances and a 0-1-vector where 0 means same cluster, 1 means different clusters. "Normalized gamma" in Halkidi et al. (2001).


a generalisation of the within clusters sum of squares (k-means objective function), which is obtained if d is a Euclidean distance matrix. For general distance measures, this is half the sum of the within cluster squared dissimilarities divided by the cluster size.


entropy of the distribution of cluster memberships, see Meila(2007).


average distance to cluster centroid, which is the observation that minimises this average distance.


Kolmogorov distance between distribution of within-cluster Mahalanobis distances and appropriate chi-squared distribution, aggregated over clusters (I am grateful to Agustin Mayo-Iscar for the idea).


Kolmogorov distance between distribution of distances to dnnkth nearest within-cluster neighbor and appropriate Gamma-distribution, see Byers and Raftery (1998), aggregated over clusters. dnnk is parameter nnk of distrsimilarity, corresponding to dnnk of clusterbenchstats.


aggregated density mode index equal to 0.75*dindex+0.25*highdgap before standardisation.

Furthermore, a valstat object has the following list components:


maximum number of clusters.


minimum number of clusters (list entries below that number are empty lists).


vector of names (character strings) of clustering CBI-functions, see kmeansCBI.


vector of names (character strings) of clustering methods. These can be user-chosen names (see argument methodsnames in clusterbenchstats) and may distinguish different methods run by the same CBI-function but with different parameter values such as complete and average linkage for hclustCBI.


vector of names (character strings) of cluster validation indexes.


These objects are generated as part of the clusterbenchstats-output.


The valstat class has methods for the following generic functions: print, plot, see plot.valstat.


Hennig, C. (2019) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Data Analysis and Applications 1: Clustering and Regression, Modeling-estimating, Forecasting and Data Mining, Volume 2, Wiley, New York 1-24, https://arxiv.org/abs/1703.09282

Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster validity indexes for context-adapted comparison of clusterings. Statistics and Computing, 30, 1523-1544, https://link.springer.com/article/10.1007/s11222-020-09958-2, https://arxiv.org/abs/2002.01822

See Also

clusterbenchstats, plot.valstat.