These functions provide an interface to several clustering methods
  implemented in R, for use together with the cluster stability
  assessment in clusterboot (as parameter
  clustermethod; "CBI" stands for "clusterboot interface").
  In some situations it could make sense to use them to compute a
  clustering even if you don't want to run clusterboot, because
  some of the functions contain some additional features (e.g., normal
  mixture model based clustering of dissimilarity matrices projected
  into the Euclidean space by MDS or partitioning around medoids with
  estimated number of clusters, noise/outlier identification in
  hierarchical clustering).
kmeansCBI(data,krange,k,scaling=FALSE,runs=1,criterion="ch",...)hclustCBI(data,k,cut="number",method,scaling=TRUE,noisecut=0,...)
hclusttreeCBI(data,minlevel=2,method,scaling=TRUE,...)
disthclustCBI(dmatrix,k,cut="number",method,noisecut=0,...)
noisemclustCBI(data,G,k,emModelNames,nnk,hcmodel=NULL,Vinv=NULL,
                        summary.out=FALSE,...)
distnoisemclustCBI(dmatrix,G,k,emModelNames,nnk,
                        hcmodel=NULL,Vinv=NULL,mdsmethod="classical",
                        mdsdim=4, summary.out=FALSE, points.out=FALSE,...)
claraCBI(data,k,usepam=TRUE,diss=inherits(data,"dist"),...)
pamkCBI(data,krange=2:10,k=NULL,criterion="asw", usepam=TRUE,
        scaling=TRUE,diss=inherits(data,"dist"),...)
trimkmeansCBI(data,k,scaling=TRUE,trim=0.1,...)
disttrimkmeansCBI(dmatrix,k,scaling=TRUE,trim=0.1,
                         mdsmethod="classical",
                         mdsdim=4,...)
dbscanCBI(data,eps,MinPts,diss=inherits(data,"dist"),...)
mahalCBI(data,clustercut=0.5,...)
mergenormCBI(data, G=NULL, k=NULL, emModelNames=NULL, nnk=0,
                         hcmodel = NULL,
                         Vinv = NULL, mergemethod="bhat",
                         cutoff=0.1,...)
speccCBI(data,k,...)
a numeric matrix. The data
    matrix - usually a cases*variables-data matrix. claraCBI,
    pamkCBI and dbscanCBI work with an
    n*n-dissimilarity matrix as well, see parameter diss.
a squared numerical dissimilarity matrix or a
    dist-object.
numeric, usually integer. In most cases, this is the number
    of clusters for methods where this is fixed. For hclustCBI
    and disthclustCBI see parameter cut below. Some
    methods have a k parameter on top of a G or
    krange parameter for compatibility; k in these cases
    does not have to be specified but if it is, it is always a single
    number of clusters and overwrites G and
    krange.
either a logical value or a numeric vector of length
    equal to the number of variables. If scaling is a numeric
    vector with length equal to the number of variables, then each
    variable is divided by the corresponding value from scaling.
    If scaling is TRUE then scaling is done by dividing
    the (centered) variables by their root-mean-square, and if
    scaling is FALSE, no scaling is done before execution.
integer. Number of random initializations from which the k-means algorithm is started.
"ch" or "asw". Decides whether number
    of clusters is estimated by the Calinski-Harabasz criterion or by the
    average silhouette width.
either "level" or "number". This determines how
    cutree is used to obtain a partition from a hierarchy
    tree. cut="level" means that the tree is cut at a particular
    dissimilarity level, cut="number" means that the tree is cut
    in order to obtain a fixed number of clusters. The parameter
    k specifies the number of clusters or the dissimilarity
    level, depending on cut.
method for hierarchical clustering, see the
    documentation of hclust.
numeric. All clusters of size <=noisecut in the
    disthclustCBI/hclustCBI-partition are joined and declared as
    noise/outliers.
integer. minlevel=1 means that all clusters in
    the tree are given out by hclusttreeCBI or
    disthclusttreeCBI, including one-point
    clusters (but excluding the cluster with all
    points). minlevel=2 excludes the one-point clusters.
    minlevel=3 excludes the two-point cluster which has been
    merged first, and increasing the value of minlevel by 1 in
    all further steps means that the remaining earliest formed cluster
    is excluded.
vector of integers. Number of clusters or numbers of clusters
    used by
    mclustBIC. If
    G has more than one entry, the number of clusters is
    estimated by the BIC.
vector of string. Models for covariance matrices,
    see documentation of
    mclustBIC.
numeric. See documentation of
    mclustBIC.
logical. If TRUE, the result of
    summary.mclustBIC is added as component
    mclustsummary to the output of noisemclustCBI and
    distnoisemclustCBI.
integer. Dimensionality of MDS solution.
logical. If TRUE, the matrix of MDS points
    is added as component
    points to the output of noisemclustCBI.
logical. If TRUE, data will be considered as
    a dissimilarity matrix. In claraCBI, this requires
    usepam=TRUE.
vector of integers. Numbers of clusters to be compared.
numeric between 0 and 1. Proportion of data points
    trimmed, i.e., assigned to noise. See tclust in the tclust package,
    trimkmeans.
numeric. The radius of the neighborhoods to be considered
    by dbscan.
integer. How many points have to be in a neighborhood so
    that a point is considered to be a cluster seed? See documentation
    of dbscan.
numeric between 0 and 1. If fixmahal
    is used for fuzzy clustering, a crisp partition is generated and
    points with cluster membership values above clustercut are
    considered as members of the corresponding cluster.
method for merging Gaussians, passed on as
    method to mergenormals.
numeric between 0 and 1, tuning constant for
    mergenormals.
further parameters to be transferred to the original clustering functions (not required).
All interface functions return a list with the following components
  (there may be some more, see summary.out and points.out
  above):
clustering result, usually a list with the full output of the clustering method (the precise format doesn't matter); whatever you want to use later.
number of clusters. If some points don't belong to any
	cluster but are declared as "noise", nc includes the
	noise component, and there should be another component
	nccl, being the number of clusters not including the
	noise component.
this is a list consisting of a logical vectors
	of length of the number of data points (n) for each cluster,
	indicating whether a point is a member of this cluster
	(TRUE) or not. If a noise component is included, it
	should always be the last vector in this list.
an integer vector of length n,
	partitioning the data. If the method produces a partition, it
	should be the clustering. This component is only used for plots,
	so you could do something like rep(1,n) for
	non-partitioning methods.
a string indicating the clustering method.
see nc above.
by noisemclustCBI and distnoisemclustCBI,
    see above.
logical vector, indicating initially estimated noise by
    NNclean, called by noisemclustCBI
    and distnoisemclustCBI.
logical. TRUE if points were classified as
    noise/outliers by disthclustCBI.
All these functions call clustering methods implemented in R to
  cluster data and to provide output in the format required by
  clusterboot. Here is a brief overview. For further
  details see the help pages of the involved clustering methods.
an interface to the function
      kmeansruns calling kmeans
      for k-means clustering. (kmeansruns allows the
      specification of several random initializations of the
      k-means algorithm and estimation of k by the Calinski-Harabasz
      index or the average silhouette width.)
an interface to the function
	hclust for agglomerative hierarchical clustering with
	noise component (see parameter noisecut above). This
	function produces a partition and assumes a cases*variables
	matrix as input.
an interface to the function
	hclust for agglomerative hierarchical clustering. This
	function gives out all clusters belonging to the hierarchy
	(upward from a certain level, see parameter minlevel
	above).
an interface to the function
	hclust for agglomerative hierarchical clustering with
	noise component (see parameter noisecut above). This
	function produces a partition and assumes a dissimilarity
	matrix as input.
an interface to the function
	mclustBIC, for normal mixture model based
	clustering. Warning: mclustBIC often
	has problems with multiple
        points. In clusterboot, it is recommended to use
	this together with multipleboot=FALSE.
an interface to the function
	mclustBIC for normal mixture model based
	clustering. This assumes a dissimilarity matrix as input and
	generates a data matrix by multidimensional scaling first.
	Warning: mclustBIC often has
	problems with multiple
        points. In clusterboot, it is recommended to use
	this together with multipleboot=FALSE.
an interface to the functions
	pam and clara
	for partitioning around medoids.
an interface to the function
      pamk calling pam for
      partitioning around medoids. The number
      of clusters is estimated by the Calinski-Harabasz index or by the
      average silhouette width.
an interface to the function
	trimkmeans for trimmed k-means
	clustering. This assumes a cases*variables matrix as input. Note
	that for
	most applications, tclustCBI with parameter
	restr.fact=1 will do about the same but faster.
an interface to the function
	tclust in the tclust package for trimmed Gaussian 
	clustering. This assumes a cases*variables matrix as input.
NOTE: This package is currently only available in CRAN as
	archived version. Therefore I cannot currently offer the
	tclustCBI-function in fpc. The code for the
	function is below in the Examples-Section, so if you need it,
	run that code first.
an interface to the function
	trimkmeans for trimmed k-means
	clustering. This assumes a dissimilarity matrix as input and
	generates a data matrix by multidimensional scaling first.
an interface to the function
	dbscan for density based 
	clustering.
an interface to the function
	fixmahal for fixed point
	clustering. This assumes a cases*variables matrix as input.
an interface to the function
      mergenormals for clustering by merging Gaussian
      mixture components. Unlike mergenormals, mergenormCBI
      includes the computation of the initial Gaussian mixture.
      This assumes a cases*variables matrix as input.
an interface to the function
      specc for spectral clustering. See
      the specc help page for additional tuning
      parameters. This assumes a cases*variables matrix as input.
clusterboot, dist,
  kmeans, kmeansruns, hclust,
  mclustBIC, 
  pam,  pamk,
  clara,
  trimkmeans, dbscan,
  fixmahal
# NOT RUN {
  options(digits=3)
  set.seed(20000)
  face <- rFace(50,dMoNo=2,dNoEy=0,p=2)
  dbs <- dbscanCBI(face,eps=1.5,MinPts=4)
  dhc <- disthclustCBI(dist(face),method="average",k=1.5,noisecut=2)
  table(dbs$partition,dhc$partition)
  dm <- mergenormCBI(face,G=10,emModelNames="EEE",nnk=2)
# Not run:
# Here is the tclustCBI-code:
# tclustCBI <- function(data,k,trim=0.05,...){
#   if(require(tclust)){
#     data <- as.matrix(data)
#     c1 <- tclust(data,k=k,alpha=trim,...)
#     sc1c <- c1$cluster
#     cl <- list()
#     nc <- nccl <- max(sc1c)
#     if (sum(sc1c==0)>0){
#       nc <- nccl+1
#       sc1c[sc1c==0] <- nc
#     }
#     for (i in 1:nc)
#       cl[[i]] <- sc1c == i
#     out <- list(result=c1,nc=nc,nccl=nccl,clusterlist=cl,partition=sc1c,
#               clustermethod="tclust")
#     out
#   }
#   else
#     warning("tclust could not be loaded")    
# }
# }
Run the code above in your browser using DataLab