The objects of class `"valstat"`

store cluster validation
statistics from various clustering methods run with various numbers of
clusters.

A legitimate `valstat`

object is a list. The format of the list
relies on the number of involved clustering methods, `nmethods`

,
say, i.e., the length
of the `method`

-component explained below. The first
`nmethods`

elements of the `valstat`

-list are just
numbered. These are themselves lists that are numbered between 1 and
the `maxG`

-component defined below. Element `[[i]][[j]]`

refers to the clustering from clustering method number i with number
of clusters j. Every such element is a list
with components
```
avewithin, mnnd, cvnnd, maxdiameter, widestgap, sindex, minsep,
asw, dindex, denscut, highdgap, pearsongamma, withinss, entropy
```

:
Further optional components are ```
pamc, kdnorm, kdunif,
dmode, aggregated
```

. All these are cluster validation indexes, as
follows.

average distance within clusters (reweighted so that every observation, rather than every distance, has the same weight).

average distance to `nnk`

th nearest neighbour within
cluster. (`nnk`

is a parameter of
`cqcluster.stats`

, default 2.)

coefficient of variation of dissimilarities to
`nnk`

th nearest wthin-cluster neighbour, measuring uniformity of
within-cluster densities, weighted over all clusters, see Sec. 3.7 of
Hennig (2019). (`nnk`

is a parameter of
`cqcluster.stats`

, default 2.)

maximum cluster diameter.

widest within-cluster gap or average of cluster-wise
widest within-cluster gap, depending on parameter `averagegap`

of `cqcluster.stats`

, default `FALSE`

.

separation index. Defined based on the distances for
every point to the
closest point not in the same cluster. The separation index is then
the mean of the smallest proportion `sepprob`

(parameter of
`cqcluster.stats`

, default 0.1) of these. See Hennig (2019).

minimum cluster separation.

average silhouette
width. See `silhouette`

.

this index measures to what extent the density decreases from the cluster mode to the outskirts; I-densdec in Sec. 3.6 of Hennig (2019); low values are good.

this index measures whether cluster boundaries run through density valleys; I-densbound in Sec. 3.6 of Hennig (2019); low values are good.

this measures whether there is a large within-cluster gap with high density on both sides; I-highdgap in Sec. 3.6 of Hennig (2019); low values are good.

correlation between distances and a 0-1-vector where 0 means same cluster, 1 means different clusters. "Normalized gamma" in Halkidi et al. (2001).

a generalisation of the within clusters sum
of squares (k-means objective function), which is obtained if
`d`

is a Euclidean distance matrix. For general distance
measures, this is half
the sum of the within cluster squared dissimilarities divided by the
cluster size.

entropy of the distribution of cluster memberships, see Meila(2007).

average distance to cluster centroid, which is the observation that minimises this average distance.

Kolmogorov distance between distribution of within-cluster Mahalanobis distances and appropriate chi-squared distribution, aggregated over clusters (I am grateful to Agustin Mayo-Iscar for the idea).

Kolmogorov distance between distribution of distances to
`dnnk`

th nearest within-cluster neighbor and appropriate
Gamma-distribution, see Byers and Raftery (1998), aggregated over
clusters. `dnnk`

is parameter `nnk`

of
`distrsimilarity`

, corresponding to `dnnk`

of
`clusterbenchstats`

.

aggregated density mode index equal to
`0.75*dindex+0.25*highdgap`

before standardisation.

Furthermore, a valstat object has the following list components:

maximum number of clusters.

minimum number of clusters (list entries below that number are empty lists).

vector of names (character strings) of clustering
CBI-functions, see `kmeansCBI`

.

vector of names (character strings) of clustering
methods. These can be user-chosen names (see argument
`methodsnames`

in `clusterbenchstats`

) and may
distinguish different methods run by the same CBI-function but with
different parameter values such as complete and average linkage for
`hclustCBI`

.

vector of names (character strings) of cluster validation indexes.

These objects are generated as part of the
`clusterbenchstats`

-output.

The `valstat`

class has methods for the following generic functions:
`print`

, `plot`

, see `plot.valstat`

.

Hennig, C. (2019) Cluster validation by measurement of clustering
characteristics relevant to the user. In C. H. Skiadas (ed.)
*Data Analysis and Applications 1: Clustering and Regression,
Modeling-estimating, Forecasting and Data Mining, Volume 2*, Wiley,
New York 1-24,
https://arxiv.org/abs/1703.09282

Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster
validity indexes for context-adapted comparison of clusterings.
*Statistics and Computing*, 30, 1523-1544,
https://link.springer.com/article/10.1007/s11222-020-09958-2, https://arxiv.org/abs/2002.01822