Selection of the number of clusters via bootstrap as explained in Fang and Wang (2012). Several times 2 bootstrap samples are drawn from the data and the number of clusters is chosen by optimising an instability estimation from these pairs.

In principle all clustering methods can be used that have a
CBI-wrapper, see `clusterboot`

,
`kmeansCBI`

. However, the currently implemented
classification methods are not necessarily suitable for all of them,
see argument `classification`

.

```
nselectboot(data,B=50,distances=inherits(data,"dist"),
clustermethod=NULL,
classification="averagedist",centroidname = NULL,
krange=2:10, count=FALSE,nnk=1,
largeisgood=FALSE,...)
```

data

something that can be coerced into a matrix. The data
matrix - either an `n*p`

-data matrix (or data frame) or an
`n*n`

-dissimilarity matrix (or `dist`

-object).

B

integer. Number of resampling runs.

distances

logical. If `TRUE`

, the data is interpreted as
dissimilarity matrix. If `data`

is a `dist`

-object,
`distances=TRUE`

automatically, otherwise
`distances=FALSE`

by default. This means that you have to set
it to `TRUE`

manually if `data`

is a dissimilarity matrix.

clustermethod

an interface function (the function name, not a
string containing the name, has to be provided!). This defines the
clustering method. See the "Details"-section of `clusterboot`

and `kmeansCBI`

for the format. Clustering methods for
`nselectboot`

must have a `k`

-argument for the number of
clusters and must otherwise follow the specifications in
`clusterboot`

. Note that `nselectboot`

won't work
with CBI-functions that implicitly already estimate the number of
clusters such as `pamkCBI`

; use `claraCBI`

if you want to run it for pam/clara clustering.

classification

string.
This determines how non-clustered points are classified to given
clusters. Options are explained in `classifdist`

(if
`distances=TRUE`

) and `classifnp`

(otherwise).
Certain classification methods are connected to certain clustering
methods. `classification="averagedist"`

is recommended for
average linkage, `classification="centroid"`

is recommended for
k-means, clara and pam (with distances it will work with
`claraCBI`

only), `classification="knn"`

with
`nnk=1`

is recommended for single linkage and
`classification="qda"`

is recommended for Gaussian mixtures
with flexible covariance matrices.

centroidname

string. Indicates the name of the component of
`CBIoutput$result`

that contains the cluster centroids in case of
`classification="centroid"`

, where `CBIoutput`

is the
output object of `clustermethod`

. If `clustermethod`

is
`kmeansCBI`

or `claraCBI`

, centroids are recognised
automatically if `centroidname=NULL`

. If
`centroidname=NULL`

and `distances=FALSE`

, cluster means
are computed as the cluster centroids.

krange

integer vector; numbers of clusters to be tried.

count

logical. If `TRUE`

, numbers of clusters and
bootstrap runs are printed.

nnk

number of nearest neighbours if
`classification="knn"`

, see `classifdist`

(if
`distances=TRUE`

) and `classifnp`

(otherwise).

largeisgood

logical. If `TRUE`

, output component
`stabk`

is taken as one minus the original instability value
so that larger values of `stabk`

are better.

...

arguments to be passed on to the clustering method.

`nselectboot`

returns a list with components
`kopt,stabk,stab`

.

optimal number of clusters.

mean instability values for numbers of clusters (or one
minus this if `largeisgood=TRUE`

).

matrix of instability values for all bootstrap runs and numbers of clusters.

Fang, Y. and Wang, J. (2012) Selection of the number of clusters via
the bootstrap method. *Computational Statistics and Data
Analysis*, 56, 468-477.

# NOT RUN { set.seed(20000) face <- rFace(50,dMoNo=2,dNoEy=0,p=2) nselectboot(dist(face),B=2,clustermethod=disthclustCBI, method="average",krange=5:7) nselectboot(dist(face),B=2,clustermethod=claraCBI, classification="centroid",krange=5:7) nselectboot(face,B=2,clustermethod=kmeansCBI, classification="centroid",krange=5:7) # Of course use larger B in a real application. # }