Learn R Programming

fpc (version 2.1-6)

nselectboot: Selection of the number of clusters via bootstrap

Description

Selection of the number of clusters via bootstrap as explained in Fang and Wang (2012). Several times 2 bootstrap samples are drawn from the data and the number of clusters is chosen by optimising an instability estimation from these pairs.

In principle all clustering methods can be used that have a CBI-wrapper, see clusterboot, kmeansCBI. However, the currently implemented classification methods are not necessarily suitable for all of them, see argument classification.

Usage

nselectboot(data,B=50,distances=inherits(data,"dist"),
                        clustermethod=NULL,
                        classification="averagedist",krange=2:10,
                        count=FALSE,nnk=1, ...)

Arguments

data
something that can be coerced into a matrix. The data matrix - either an n*p-data matrix (or data frame) or an n*n-dissimilarity matrix (or dist-object).
B
integer. Number of resampling runs.
distances
logical. If TRUE, the data is interpreted as dissimilarity matrix. If data is a dist-object, distances=TRUE automatically, otherwise distances=FALSE by default. This means that y
clustermethod
an interface function (the function name, not a string containing the name, has to be provided!). This defines the clustering method. See the "Details"-section of clusterboot and
classification
string. This determines how non-clustered points are classified to given clusters. Options are explained in classifdist (if distances=TRUE) and
krange
integer vector; numbers of clusters to be tried.
count
logical. If TRUE, numbers of clusters and bootstrap runs are printed.
nnk
number of nearest neighbours if classification="knn", see classifdist (if distances=TRUE) and classifnp (otherwise).
...
arguments to be passed on to the clustering method.

Value

  • nselectboot returns a list with components kopt,stabk,stab.
  • koptoptimal number of clusters.
  • stabkmean instability values for numbers of clusters.
  • stabmatrix of instability values for all bootstrap runs and numbers of clusters.

References

Fang, Y. and Wang, J. (2012) Selection of the number of clusters via the bootstrap method. Computational Statistics and Data Analysis, 56, 468-477.

See Also

classifdist, classifnp, clusterboot,kmeansCBI

Examples

Run this code
set.seed(20000)
  face <- rFace(50,dMoNo=2,dNoEy=0,p=2)
  nselectboot(dist(face),B=2,clustermethod=disthclustCBI,
   method="average",krange=5:7)
  nselectboot(dist(face),B=2,clustermethod=claraCBI,
   classification="centroid",krange=5:7)
  nselectboot(face,B=2,clustermethod=kmeansCBI,
   classification="centroid",krange=5:7)
# Of course use larger B in a real application.

Run the code above in your browser using DataLab