Learn R Programming

bootcluster (version 0.4.2)

k.select: Estimate number of clusters

Description

Estimate number of clusters by bootstrapping stability

Usage

k.select(x, range = 2:7, B = 20, r = 5, threshold = 0.8, scheme_2 = TRUE)

Value

profile

a vector of Smin measures for determining k

k

integer estimated number of clusters

Arguments

x

a data.frame of the data set

range

a vector of integer values, of the possible numbers of clusters k

B

number of bootstrap re-samplings

r

number of runs of k-means

threshold

the threshold for determining k

scheme_2

logical TRUE if scheme 2 is used, FASLE if scheme 1 is used

Author

Han Yu

Details

This function estimates the number of clusters through a bootstrapping approach, and a measure Smin, which is based on an observation-wise similarity among clusterings. The number of clusters k is selected as the largest number of clusters, for which the Smin is greater than a threshold. The threshold is often selected between 0.8 ~ 0.9. Two schemes are provided. Scheme 1 uses the clustering of the original data as the reference for stability calculations. Scheme 2 searches acrossthe clustering samples that gives the most stable clustering.

References

Bootstrapping estimates of stability for clusters, observations and model selection. Han Yu, Brian Chapman, Arianna DiFlorio, Ellen Eischen, David Gotz, Matthews Jacob and Rachael Hageman Blair.

Examples

Run this code
# \donttest{
set.seed(1)
data(wine)
x0 <- wine[,2:14]
x <- scale(x0)
k.select(x, range = 2:10, B=20, r=5, scheme_2 = TRUE)
# }

Run the code above in your browser using DataLab