Learn R Programming

clues (version 0.3.2)

compClust: Compare different partitions for a data set

Description

Compare different partitions for a data set based on agreement indices, average sihouette index and CH index.

Usage

compClust(y, memMat, disMethod = "Euclidean")

Arguments

y
data matrix which is a R matrix object (for dimension > 1) or vector object (for dimension=1) with rows being observations and columns being variables.
memMat
cluster membership matrix. Each column corresponds to a partition of the matrix y. The numbers of clusters for different partitions can be different. The cluster membership of a $g$-cluster data set should take values: $1$, $2$, $\ldo
disMethod
specification of the dissimilarity measure. The available measures are Euclidean and 1-corr.

Value

  • avg.sa vector of average sihouette indices for the different partitions in memMat.
  • CHa vector of CH indices for the different partitions in memMat.
  • Randa matrix of Rand indices measuring the pair-wise agreement among the different partitions in memMat.
  • HAa matrix of Hubert and Arabie's adjusted Rand indices measuring the pair-wise agreement among the different partitions in memMat.
  • MAa matrix of Morey and Agresti's adjusted Rand indices measuring the pair-wise agreement among the different partitions in memMat.
  • FMa matrix of Fowlkes and Mallows's indices measuring the pair-wise agreement among the different partitions in memMat.
  • Jaccarda matrix of Jaccard indices measuring the pair-wise agreement among the different partitions in memMat.

References

Calinski, R.B., Harabasz, J., (1974). A dendrite method for cluster analysis. Communications in Statistics, Vol. 3, pages 1-27.

Kaufman, L., Rousseeuw, P.J., (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.

Milligan, G.W. and Cooper, M.C. (1986) A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavioral Research 21, 441--458.

Wang, S., Qiu, W., and Zamar, R. H. (2007). CLUES: A non-parametric clustering method based on local shrinking. Computational Statistics & Data Analysis, Vol. 52, issue 1, pages 286-298.

Examples

Run this code
# ruspini data
  data(Ruspini)
  # data matrix
  ruspini <- Ruspini$ruspini
  ruspini.mem <- Ruspini$ruspini.mem
    
  res.CH <- clues(ruspini, strengthMethod = "CH", quiet = TRUE)
  res.sil <- clues(ruspini, strengthMethod = "sil", quiet = TRUE)
  res.km.HW <- kmeans(ruspini, 4, algorithm = "Hartigan-Wong")
  res.km.L <- kmeans(ruspini, 4, algorithm = "Lloyd")
  res.km.F <- kmeans(ruspini, 4, algorithm = "Forgy")
  res.km.M <- kmeans(ruspini, 4, algorithm = "MacQueen")

  memMat <- cbind(ruspini.mem, res.CH$mem, res.sil$mem, res.km.HW$cluster,
    res.km.L$cluster, res.km.F$cluster, res.km.M$cluster) 

  colnames(memMat) <- c("true", "clues.CH", "clues.sil", "km.HW", "km.L", "km.F",
    "km.M")

  res <- compClust(ruspini, memMat)
  round(res$avg.s, 1)
  round(res$CH, 1)
  round(res$Rand, 1)
  round(res$HA, 1)
  round(res$MA, 1)
  round(res$FM, 1)
  round(res$Jaccard, 1)

Run the code above in your browser using DataLab