Learn R Programming

fpc (version 2.1-6)

kmeansruns: k-means with estimating k and initialisations

Description

This calls the function kmeans to perform a k-means clustering, but initializes the k-means algorithm several times with random points from the data set as means. Furthermore, it is more robust against the occurrence of empty clusters in the algorithm and it estimates the number of clusters by either the Calinski Harabasz index (calinhara) or average silhouette width (see pam.object). The Duda-Hart test (dudahart2) is applied to decide whether there should be more than one cluster (unless 1 is excluded as number of clusters).

Usage

kmeansruns(data,krange=2:10,criterion="ch",
                       iter.max=100,runs=100,
                       scaledata=FALSE,alpha=0.001,
                       critout=FALSE,plot=FALSE,...)

Arguments

data
A numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).
krange
integer vector. Numbers of clusters which are to be compared by the average silhouette width criterion. Note: average silhouette width and Calinski-Harabasz can't estimate number of clusters nc=1. If 1 is included, a Duda-Hart tes
criterion
one of "asw" or "ch". Determines whether average silhouette width or Calinski-Harabasz is applied.
iter.max
integer. The maximum number of iterations allowed.
runs
integer. Number of starts of the k-means algorithm.
scaledata
logical. If TRUE, the variables are centered and scaled to unit variance before execution.
alpha
numeric between 0 and 1, tuning constant for dudahart2 (only used for 1-cluster test).
critout
logical. If TRUE, the criterion value is printed out for every number of clusters.
plot
logical. If TRUE, every clustering resulting from a run of the algorithm is plotted.
...
further arguments to be passed on to kmeans.

Value

  • The output of the optimal run of the kmeans-function with added components bestk and crit. A list with components
  • clusterA vector of integers indicating the cluster to which each point is allocated.
  • centersA matrix of cluster centers.
  • withinssThe within-cluster sum of squares for each cluster.
  • sizeThe number of points in each cluster.
  • bestkThe optimal number of clusters.
  • critVector with values of the criterion for all used numbers of clusters (0 if number not tried).

References

Calinski, T., and Harabasz, J. (1974) A Dendrite Method for Cluster Analysis, Communications in Statistics, 3, 1-27.

Duda, R. O. and Hart, P. E. (1973) Pattern Classification and Scene Analysis. Wiley, New York.

Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics, 28, 100-108.

Kaufman, L. and Rousseeuw, P.J. (1990). "Finding Groups in Data: An Introduction to Cluster Analysis". Wiley, New York.

See Also

kmeans, pamk, calinhara, dudahart2)

Examples

Run this code
set.seed(20000)
  face <- rFace(50,dMoNo=2,dNoEy=0,p=2)
  pka <- kmeansruns(face,krange=1:5,critout=TRUE,runs=2,criterion="asw")
  pkc <- kmeansruns(face,krange=1:5,critout=TRUE,runs=2,criterion="ch")

Run the code above in your browser using DataLab