cluster.stats(d,clustering,alt.clustering=NULL,
silhouette=TRUE,G2=FALSE,G3=FALSE,
compareonly=FALSE)dist) or a distance
matrix between cases.clustering, indicating an alternative clustering. If provided, the
corrected Rand index and Meila's VI for clustering
vs. alt.clustering are computed.TRUE, the silhouette statistics
are computed, which requires package cluster.TRUE, Goodman and Kruskal's index G2
(cf. Gordon (1999), p. 62) is computed. This executes lots of
sorting algorithms and can be very slow (it has been improved
by R. Francois - thanks!)TRUE, the index G3
(cf. Gordon (1999), p. 62) is computed. This executes sort
on all distances and can be extremely slow.TRUE, only the corrected Rand index
and Meila's VI are
computed and given out (this requires alt.clustering to be
specified).cluster.stats returns a list containing the components
n, cluster.number, cluster.size, diameter,
average.distance, median.distance, separation, average.toother,
separation.matrix, average.between, average.within,
n.between, n.within, within.cluster.ss, clus.avg.silwidths, avg.silwidth,
g2, g3, pearsongamma, dunn, entropy, wb.ratio, ch,
corrected.rand, vi except if compareonly=TRUE, in which case
only the last two components are computed.d is a Euclidean distance matrix. For general distance
measures, this is half
the sum of the within cluster squared dissimilarities divided by the
cluster size.silhouette.silhouette.average.within/average.between.alt.clustering
has been specified), see Gordon (1999, p. 198).alt.clustering
has been specified), see Meila (2007).Gordon, A. D. (1999) Classification, 2nd ed. Chapman and Hall.
Halkidi, M., Batistakis, Y., Vazirgiannis, M. (2001) On Clustering Validation Techniques, Journal of Intelligent Information Systems, 17, 107-145.
Hennig, C. and Liao, T. (2010) Comparing latent class and
dissimilarity based clustering for mixed type variables with
application to social stratification. Research report no. 308,
Department of Statistical Science, UCL.
Meila, M. (2007) Comparing clusterings?an information based distance, Journal of Multivariate Analysis, 98, 873-895. Milligan, G. W. and Cooper, M. C. (1985) An examination of procedures for determining the number of clusters. Psychometrika, 50, 159-179.
silhouette, dist, calinhara,
clusterboot computes clusterwise stability statistics by
resampling.set.seed(20000)
face <- rFace(200,dMoNo=2,dNoEy=0,p=2)
dface <- dist(face)
complete3 <- cutree(hclust(dface),3)
cluster.stats(dface,complete3,
alt.clustering=as.integer(attr(face,"grouping")))Run the code above in your browser using DataLab