Two measures of dissimilarity between the within-cluster distributions of a dataset and normal or uniform distribution. For the normal it's the Kolmogorov dissimilarity between the Mahalanobis distances to the center and a chi-squared distribution. For the uniform it is the Kolmogorov distance between the distance to the kth nearest neighbour and a Gamma distribution (this is based on Byers and Raftery (1998)). The clusterwise values are aggregated by weighting with the cluster sizes.

```
distrsimilarity(x,clustering,noisecluster = FALSE,
distribution=c("normal","uniform"),nnk=2,
largeisgood=FALSE,messages=FALSE)
```

x

the data matrix; a numerical object which can be coerced to a matrix.

clustering

integer vector of class numbers; length must equal
`nrow(x)`

, numbers must go from 1 to the number of clusters.

noisecluster

logical. If `TRUE`

, the cluster with the
largest number is ignored for the computations.

distribution

vector of `"normal", "uniform"`

or
both. Indicates which of the two dissimilarities is/are computed.

nnk

integer. Number of nearest neighbors to use for dissimilarity to the uniform.

largeisgood

logical. If `TRUE`

, dissimilarities are
transformed to `1-d`

(this means that larger values indicate a
better fit).

messages

logical. If `TRUE`

, warnings are given if
within-cluster covariance matrices are not invertible (in which case
all within-cluster Mahalanobis distances are set to zero).

List with the following components

Kolmogorov distance between distribution of within-cluster Mahalanobis distances and appropriate chi-squared distribution, aggregated over clusters (I am grateful to Agustin Mayo-Iscar for the idea).

Kolmogorov distance between distribution of distances to
`nnk`

th nearest within-cluster neighbor and appropriate
Gamma-distribution, see Byers and Raftery (1998), aggregated over
clusters.

vector of cluster-wise Kolmogorov distances between distribution of within-cluster Mahalanobis distances and appropriate chi-squared distribution.

vector of cluster-wise Kolmogorov distances between
distribution of distances to `nnk`

th nearest within-cluster
neighbor and appropriate Gamma-distribution.

vector of Mahalanobs distances to the respective cluster center.

vector of distance to `nnk`

th nearest within-cluster
neighbor.

Byers, S. and Raftery, A. E. (1998) Nearest-Neighbor Clutter
Removal for Estimating Features in Spatial Point Processes,
*Journal of the American Statistical Association*, 93, 577-584.

Hennig, C. (2017) Cluster validation by measurement of clustering
characteristics relevant to the user. In C. H. Skiadas (ed.)
*Proceedings of ASMDA 2017*, 501-520,
https://arxiv.org/abs/1703.09282

`cqcluster.stats`

,`cluster.stats`

for more cluster validity statistics.

# NOT RUN { set.seed(20000) options(digits=3) face <- rFace(200,dMoNo=2,dNoEy=0,p=2) km3 <- kmeans(face,3) distrsimilarity(face,km3$cluster) # }