FitKMeans: Fit a series of kmeans clusterings and compute Hartigan's Number

Description

Given a numeric dataset this function fits a series of kmeans clusterings with increasing number of centers. k-means is compared to k+1-means using Hartigan's Number to determine if the k+1st cluster should be added.

Usage

FitKMeans(x, max.clusters = 12L, spectral = FALSE, nstart = 1L,
  iter.max = 10L, algorithm = c("Hartigan-Wong", "Lloyd", "Forgy",
  "MacQueen"), seed = NULL)

Value

A data.frame consisting of columns, for the number of clusters, the Hartigan Number and whether that cluster should be added, based on Hartigan's Number.

Arguments

x: The data, numeric, either a matrix or data.frame
max.clusters: The maximum number of clusters that should be tried
spectral: logical; If the data being fit are eigenvectors for spectral clustering
nstart: The number of random starts for the kmeans algorithm to use
iter.max: Maximum number of tries before the kmeans algorithm gives up on conversion
algorithm: The desired algorithm to be used for kmeans. Options are c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen"). See kmeans
seed: If not null, the random seed will be reset before each application of the kmeans algorithm

Author

Jared P. Lander www.jaredlander.com

Details

A consecutive series of kmeans is computed with increasing k (number of centers). Each result for k and k+1 are compared using Hartigan's Number. If the number is greater than 10, it is noted that having k+1 clusters is of value.

References

http://www.stat.columbia.edu/~madigan/DM08/descriptive.ppt.pdf

Examples

Run this code


data(iris)
hartiganResults <- FitKMeans(iris[, -ncol(iris)])
PlotHartigan(hartiganResults)

Run the code above in your browser using DataLab