compClust: Compare different partitions for a data set

Description

Compare different partitions for a data set based on agreement indices, average sihouette index and CH index.

Usage

compClust(y, memMat, disMethod = "Euclidean")

Arguments

data matrix which is an R matrix object (for dimension > 1) or vector object (for dimension = 1) with rows being observations and columns being variables.

memMat

cluster membership matrix. Each column corresponds to a partition of the matrix y. The numbers of clusters for different partitions can be different. The cluster membership of a \(g\)-cluster data set should take values: \(1\), \(2\), \(\ldots\), \(g\).

disMethod

specification of the dissimilarity measure. The available measures are “Euclidean” and “1-corr”.

Value

avg.s

a vector of average sihouette indices for the different partitions in memMat.

a vector of CH indices for the different partitions in memMat.

Rand

a matrix of Rand indices measuring the pair-wise agreement among the different partitions in memMat.

a matrix of Hubert and Arabie's adjusted Rand indices measuring the pair-wise agreement among the different partitions in memMat.

a matrix of Morey and Agresti's adjusted Rand indices measuring the pair-wise agreement among the different partitions in memMat.

a matrix of Fowlkes and Mallows's indices measuring the pair-wise agreement among the different partitions in memMat.

Jaccard

a matrix of Jaccard indices measuring the pair-wise agreement among the different partitions in memMat.

References

Calinski, R.B., Harabasz, J., (1974). A dendrite method for cluster analysis. Communications in Statistics, Vol. 3, pages 1-27.

Kaufman, L., Rousseeuw, P.J., (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.

Milligan, G.W. and Cooper, M.C. (1986) A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavioral Research 21, 441--458.

Wang, S., Qiu, W., and Zamar, R. H. (2007). CLUES: A non-parametric clustering method based on local shrinking. Computational Statistics & Data Analysis, Vol. 52, issue 1, pages 286-298.

Examples

Run this code

# NOT RUN {
    # Maronna data set
    data(Maronna)
    # data matrix
    maronna <- Maronna$maronna
    # cluster membership
    maronna.mem <- Maronna$maronna.mem    

    # partition by clues and kmeans
    res_CH <- clues(maronna, strengthMethod = "CH", quiet = TRUE)
    res_sil <- clues(maronna, strengthMethod = "sil", quiet = TRUE)
    res_km_HW <- kmeans(maronna, 4, algorithm = "Hartigan-Wong")
    res_km_L <- kmeans(maronna, 4, algorithm = "Lloyd")
    res_km_F <- kmeans(maronna, 4, algorithm = "Forgy")
    res_km_M <- kmeans(maronna, 4, algorithm = "MacQueen")
 
    memMat <- cbind(maronna.mem, res_CH$mem, res_sil$mem, 
        res_km_HW$cluster, res_km_L$cluster, 
        res_km_F$cluster, res_km_M$cluster) 
 
    colnames(memMat) <- c("true", "clues_CH", "clues_sil", 
        "km_HW", "km_L", "km_F", "km_M")
 
    res <- compClust(maronna, memMat)

    print(sapply(res, function(x) {round(x,1)}))

# }

Run the code above in your browser using DataLab