These functions provide an interface to several clustering methods
implemented in R, for use together with the cluster stability
assessment in `clusterboot`

(as parameter
`clustermethod`

; "CBI" stands for "clusterboot interface").
In some situations it could make sense to use them to compute a
clustering even if you don't want to run `clusterboot`

, because
some of the functions contain some additional features (e.g., normal
mixture model based clustering of dissimilarity matrices projected
into the Euclidean space by MDS or partitioning around medoids with
estimated number of clusters, noise/outlier identification in
hierarchical clustering).

`kmeansCBI(data,krange,k,scaling=FALSE,runs=1,criterion="ch",...)`hclustCBI(data,k,cut="number",method,scaling=TRUE,noisecut=0,...)

hclusttreeCBI(data,minlevel=2,method,scaling=TRUE,...)

disthclustCBI(dmatrix,k,cut="number",method,noisecut=0,...)

noisemclustCBI(data,G,k,modelNames,nnk,hcmodel=NULL,Vinv=NULL,
summary.out=FALSE,...)

distnoisemclustCBI(dmatrix,G,k,modelNames,nnk,
hcmodel=NULL,Vinv=NULL,mdsmethod="classical",
mdsdim=4, summary.out=FALSE, points.out=FALSE,...)

claraCBI(data,k,usepam=TRUE,diss=inherits(data,"dist"),...)

pamkCBI(data,krange=2:10,k=NULL,criterion="asw", usepam=TRUE,
scaling=FALSE,diss=inherits(data,"dist"),...)

tclustCBI(data,k,trim=0.05,...)

dbscanCBI(data,eps,MinPts,diss=inherits(data,"dist"),...)

mahalCBI(data,clustercut=0.5,...)

mergenormCBI(data, G=NULL, k=NULL, modelNames=NULL, nnk=0,
hcmodel = NULL,
Vinv = NULL, mergemethod="bhat",
cutoff=0.1,...)

speccCBI(data,k,...)

pdfclustCBI(data,...)

stupidkcentroidsCBI(dmatrix,k,distances=TRUE)

stupidknnCBI(dmatrix,k)

stupidkfnCBI(dmatrix,k)

stupidkavenCBI(dmatrix,k)

data

a numeric matrix. The data
matrix - usually a cases*variables-data matrix. `claraCBI`

,
`pamkCBI`

and `dbscanCBI`

work with an
`n*n`

-dissimilarity matrix as well, see parameter `diss`

.

dmatrix

a squared numerical dissimilarity matrix or a
`dist`

-object.

k

numeric, usually integer. In most cases, this is the number
of clusters for methods where this is fixed. For `hclustCBI`

and `disthclustCBI`

see parameter `cut`

below. Some
methods have a `k`

parameter on top of a `G`

or
`krange`

parameter for compatibility; `k`

in these cases
does not have to be specified but if it is, it is always a single
number of clusters and overwrites `G`

and
`krange`

.

scaling

either a logical value or a numeric vector of length
equal to the number of variables. If `scaling`

is a numeric
vector with length equal to the number of variables, then each
variable is divided by the corresponding value from `scaling`

.
If `scaling`

is `TRUE`

then scaling is done by dividing
the (centered) variables by their root-mean-square, and if
`scaling`

is `FALSE`

, no scaling is done before execution.

runs

integer. Number of random initializations from which the k-means algorithm is started.

criterion

`"ch"`

or `"asw"`

. Decides whether number
of clusters is estimated by the Calinski-Harabasz criterion or by the
average silhouette width.

cut

either "level" or "number". This determines how
`cutree`

is used to obtain a partition from a hierarchy
tree. `cut="level"`

means that the tree is cut at a particular
dissimilarity level, `cut="number"`

means that the tree is cut
in order to obtain a fixed number of clusters. The parameter
`k`

specifies the number of clusters or the dissimilarity
level, depending on `cut`

.

method

method for hierarchical clustering, see the
documentation of `hclust`

.

noisecut

numeric. All clusters of size `<=noisecut`

in the
`disthclustCBI`

/`hclustCBI`

-partition are joined and declared as
noise/outliers.

minlevel

integer. `minlevel=1`

means that all clusters in
the tree are given out by `hclusttreeCBI`

or
`disthclusttreeCBI`

, including one-point
clusters (but excluding the cluster with all
points). `minlevel=2`

excludes the one-point clusters.
`minlevel=3`

excludes the two-point cluster which has been
merged first, and increasing the value of `minlevel`

by 1 in
all further steps means that the remaining earliest formed cluster
is excluded.

G

vector of integers. Number of clusters or numbers of clusters
used by
`mclustBIC`

. If
`G`

has more than one entry, the number of clusters is
estimated by the BIC.

modelNames

vector of string. Models for covariance matrices,
see documentation of
`mclustBIC`

.

nnk

hcmodel

Vinv

numeric. See documentation of
`mclustBIC`

.

summary.out

logical. If `TRUE`

, the result of
`summary.mclustBIC`

is added as component
`mclustsummary`

to the output of `noisemclustCBI`

and
`distnoisemclustCBI`

.

mdsmethod

mdsdim

integer. Dimensionality of MDS solution.

points.out

logical. If `TRUE`

, the matrix of MDS points
is added as component
`points`

to the output of `noisemclustCBI`

.

usepam

diss

logical. If `TRUE`

, `data`

will be considered as
a dissimilarity matrix. In `claraCBI`

, this requires
`usepam=TRUE`

.

krange

vector of integers. Numbers of clusters to be compared.

trim

numeric between 0 and 1. Proportion of data points
trimmed, i.e., assigned to noise. See `tclust`

in the tclust package.

eps

numeric. The radius of the neighborhoods to be considered
by `dbscan`

.

MinPts

integer. How many points have to be in a neighborhood so
that a point is considered to be a cluster seed? See documentation
of `dbscan`

.

clustercut

numeric between 0 and 1. If `fixmahal`

is used for fuzzy clustering, a crisp partition is generated and
points with cluster membership values above `clustercut`

are
considered as members of the corresponding cluster.

mergemethod

method for merging Gaussians, passed on as
`method`

to `mergenormals`

.

cutoff

numeric between 0 and 1, tuning constant for
`mergenormals`

.

distances

logical (only for `stupidkcentroidsCBI`

). If
`FALSE`

, `dmatrix`

is
interpreted as cases&variables data matrix.

...

further parameters to be transferred to the original clustering functions (not required).

All interface functions return a list with the following components
(there may be some more, see `summary.out`

and `points.out`

above):

clustering result, usually a list with the full output of the clustering method (the precise format doesn't matter); whatever you want to use later.

number of clusters. If some points don't belong to any
cluster, these are declared "noise". `nc`

includes the
"noise cluster", and there should be another component
`nccl`

, being the number of clusters not including the
noise cluster.

this is a list consisting of a logical vectors
of length of the number of data points (`n`

) for each cluster,
indicating whether a point is a member of this cluster
(`TRUE`

) or not. If a noise cluster is included, it
should always be the last vector in this list.

an integer vector of length `n`

,
partitioning the data. If the method produces a partition, it
should be the clustering. This component is only used for plots,
so you could do something like `rep(1,n)`

for
non-partitioning methods. If a noise cluster is included,
`nc=nccl+1`

and the noise cluster is cluster no. `nc`

.

a string indicating the clustering method.

see `nc`

above.

by `noisemclustCBI`

and `distnoisemclustCBI`

,
see above.

logical vector, indicating initially estimated noise by
`NNclean`

, called by `noisemclustCBI`

and `distnoisemclustCBI`

.

logical. `TRUE`

if points were classified as
noise/outliers by `disthclustCBI`

.

All these functions call clustering methods implemented in R to
cluster data and to provide output in the format required by
`clusterboot`

. Here is a brief overview. For further
details see the help pages of the involved clustering methods.

- kmeansCBI
an interface to the function

`kmeansruns`

calling`kmeans`

for k-means clustering. (`kmeansruns`

allows the specification of several random initializations of the k-means algorithm and estimation of k by the Calinski-Harabasz index or the average silhouette width.)- hclustCBI
an interface to the function

`hclust`

for agglomerative hierarchical clustering with noise component (see parameter`noisecut`

above). This function produces a partition and assumes a cases*variables matrix as input.- hclusttreeCBI
an interface to the function

`hclust`

for agglomerative hierarchical clustering. This function gives out all clusters belonging to the hierarchy (upward from a certain level, see parameter`minlevel`

above).- disthclustCBI
an interface to the function

`hclust`

for agglomerative hierarchical clustering with noise component (see parameter`noisecut`

above). This function produces a partition and assumes a dissimilarity matrix as input.
% \item{disthclusttreeCBI}{an interface to the function
% \code{hclust} for agglomerative hierarchical clustering. This
% function gives out all clusters belonging to the hierarchy
% (upward from a certain level, see parameter \code{minlevel}
% above), and assumes a dissimilarity matrix as input.}
- noisemclustCBI
an interface to the function

`mclustBIC`

, for normal mixture model based clustering. Warning:`mclustBIC`

often has problems with multiple points. In`clusterboot`

, it is recommended to use this together with`multipleboot=FALSE`

.- distnoisemclustCBI
an interface to the function

`mclustBIC`

for normal mixture model based clustering. This assumes a dissimilarity matrix as input and generates a data matrix by multidimensional scaling first. Warning:`mclustBIC`

often has problems with multiple points. In`clusterboot`

, it is recommended to use this together with`multipleboot=FALSE`

.- claraCBI
an interface to the functions

`pam`

and`clara`

for partitioning around medoids.- pamkCBI
an interface to the function

`pamk`

calling`pam`

for partitioning around medoids. The number of clusters is estimated by the Calinski-Harabasz index or by the average silhouette width.- tclustCBI
an interface to the function

`tclust`

in the tclust package for trimmed Gaussian clustering. This assumes a cases*variables matrix as input.
%
% NOTE: This package is currently only available in CRAN as
% archived version. Therefore I cannot currently offer the
% \code{tclustCBI}-function in \code{fpc}. The code for the
% function is below in the Examples-Section, so if you need it,
% run that code first.}
% \item{disttrimkmeansCBI}{an interface to the function
% \code{\link[trimcluster]{trimkmeans}} for trimmed k-means
% clustering. This assumes a dissimilarity matrix as input and
% generates a data matrix by multidimensional scaling first.}
- dbscanCBI
an interface to the function

`dbscan`

for density based clustering.- mahalCBI
an interface to the function

`fixmahal`

for fixed point clustering. This assumes a cases*variables matrix as input.- mergenormCBI
an interface to the function

`mergenormals`

for clustering by merging Gaussian mixture components. Unlike`mergenormals`

,`mergenormCBI`

includes the computation of the initial Gaussian mixture. This assumes a cases*variables matrix as input.- speccCBI
an interface to the function

`specc`

for spectral clustering. See the`specc`

help page for additional tuning parameters. This assumes a cases*variables matrix as input.- pdfclustCBI
an interface to the function

`pdfCluster`

for density-based clustering. See the`pdfCluster`

help page for additional tuning parameters. This assumes a cases*variables matrix as input.
% \item{emskewCBI}{an interface to the function
% \code{\link[EMMIXskew]{EmSkew}} for clustering with the
% EM-algorithm based on Gaussian, skew Gaussian, t or skew-t
% mixtures. See
% help page of \code{\link[EMMIXskew]{EmSkew}}. This assumes a
% cases*variables matrix as input. Note that by September 2020,
% package \code{EMMIXskew} is not available on CRAN but only
% in the CRAN archives; CRAN states that it needs an update.}
- stupidkcentroidsCBI
an interface to the function

`stupidkcentroids`

for random centroid-based clustering. See the`stupidkcentroids`

help page. This can have a distance matrix as well as a cases*variables matrix as input, see parameter`distances`

.- stupidknnCBI
an interface to the function

`stupidknn`

for random nearest neighbour clustering. See the`stupidknn`

help page. This assumes a distance matrix as input.- stupidkfnCBI
an interface to the function

`stupidkfn`

for random farthest neighbour clustering. See the`stupidkfn`

help page. This assumes a distance matrix as input.- stupidkavenCBI
an interface to the function

`stupidkaven`

for random average dissimilarity clustering. See the`stupidkaven`

help page. This assumes a distance matrix as input.

`clusterboot`

, `dist`

,
`kmeans`

, `kmeansruns`

, `hclust`

,
`mclustBIC`

,
`pam`

, `pamk`

,
`clara`

,
`dbscan`

,
`fixmahal`

,
`tclust`

, `pdfCluster`

# NOT RUN { options(digits=3) set.seed(20000) face <- rFace(50,dMoNo=2,dNoEy=0,p=2) dbs <- dbscanCBI(face,eps=1.5,MinPts=4) dhc <- disthclustCBI(dist(face),method="average",k=1.5,noisecut=2) table(dbs$partition,dhc$partition) dm <- mergenormCBI(face,G=10,modelNames="EEE",nnk=2) dtc <- tclustCBI(face,6,trim=0.1,restr.fact=500) table(dm$partition,dtc$partition) # }