Learn R Programming

fpc (version 2.1-6)

distcritmulti: Distance based validity criteria for large data sets

Description

Approximates average silhouette width or the Pearson version of Hubert's gamma criterion by hacking the dataset into pieces and averaging the subset-wise values, see Hennig and Liao (2010).

Usage

distcritmulti(x,clustering,part=NULL,ns=10,criterion="asw",
                    fun="dist",metric="euclidean",
                     count=FALSE,seed=NULL,...)

Arguments

x
cases times variables data matrix.
clustering
vector of integers indicating the clustering.
part
vector of integer subset sizes; sum should be smaller or equal to the number of cases of x. If NULL, subset sizes are chosen approximately equal.
ns
integer. Number of subsets, only used if part==NULL.
criterion
"asw" or "pearsongamma", specifies whether the average silhouette width or the Pearson version of Hubert's gamma is computed.
fun
"dist" or "daisy", specifies which function is used for computing dissimilarities.
metric
passed on to dist (as argument method) or daisy to determine which dissimilarity is used.
count
logical. if TRUE, the subset number just processed is printed.
seed
integer, random seed. (If NULL, result depends on random numbers.)
...
further arguments to be passed on to dist or daisy.

Value

  • A list with components crit.overall,crit.sub,crit.sd,part.
  • crit.overallvalue of criterion.
  • crit.subvector of subset-wise criterion values.
  • crit.sdstandard deviation of crit.sub, can be used to assess stability.
  • subsetslist of case indexes in subsets.

References

Halkidi, M., Batistakis, Y., Vazirgiannis, M. (2001) On Clustering Validation Techniques, Journal of Intelligent Information Systems, 17, 107-145. Hennig, C. and Liao, T. (2010) Comparing latent class and dissimilarity based clustering for mixed type variables with application to social stratification. Research report no. 308, Department of Statistical Science, UCL. http://www.ucl.ac.uk/Stats/research/reports/psfiles/rr308.pdf Revised version accepted for publication by Journal of the Royal Statistical Society Series C.

Kaufman, L. and Rousseeuw, P.J. (1990). "Finding Groups in Data: An Introduction to Cluster Analysis". Wiley, New York.

See Also

cluster.stats, silhouette

Examples

Run this code
set.seed(20000)
    face <- rFace(50,dMoNo=2,dNoEy=0,p=2)
    clustering <- as.integer(attr(face,"grouping"))
    distcritmulti(face,clustering,ns=3,seed=100000,criterion="pearsongamma")

Run the code above in your browser using DataLab