Estimate number of clusters by bootstrapping stability
Usage
k.select(x, range = 2:7, B = 20, r = 5, threshold = 0.8, scheme_2 = TRUE)
Value
profile
a vector of Smin measures for determining k
k
integer estimated number of clusters
Arguments
x
a data.frame of the data set
range
a vector of integer values, of the possible numbers of clusters k
B
number of bootstrap re-samplings
r
number of runs of k-means
threshold
the threshold for determining k
scheme_2
logicalTRUE if scheme 2 is used, FASLE if scheme 1 is used
Author
Han Yu
Details
This function estimates the number of clusters through a bootstrapping
approach, and a measure Smin, which is based on an observation-wise similarity
among clusterings. The number of clusters k is selected as the largest number of
clusters, for which the Smin is greater than a threshold. The threshold is often
selected between 0.8 ~ 0.9. Two schemes are provided. Scheme 1 uses the clustering
of the original data as the reference for stability calculations. Scheme 2 searches
acrossthe clustering samples that gives the most stable clustering.
References
Bootstrapping estimates of stability for clusters, observations and model selection.
Han Yu, Brian Chapman, Arianna DiFlorio, Ellen Eischen, David Gotz, Matthews Jacob and Rachael Hageman Blair.