clusterboot (as parameter
clustermethod; "CBI" stands for "clusterboot interface").
In some situations it could make sense to use them to compute a
clustering even if you don't want to run clusterboot, because
some of the functions contain some additional features (e.g., normal
mixture model based clustering of dissimilarity matrices projected
into the Euclidean space by MDS or partitioning around medoids with
estimated number of clusters, noise/outlier identification in
hierarchical clustering).kmeansCBI(data,k,scaling=TRUE,runs=1,...)hclustCBI(data,k,cut="level",method,scaling=TRUE,noisecut=0,...)
hclusttreeCBI(data,minlevel=2,method,scaling=TRUE,...)
disthclustCBI(dmatrix,k,cut="level",method,noisecut=0,...)
noisemclustCBI(data,G,emModelNames,nnk,hcmodel=NULL,Vinv=NULL)
distnoisemclustCBI(dmatrix,G,emModelNames,nnk,
hcmodel=NULL,Vinv=NULL,mdsmethod="classical",
mdsdim=4)
claraCBI(data,k,usepam=TRUE,diss=FALSE,...)
pamkCBI(data,krange=2:10,scaling=TRUE,diss=FALSE,...)
trimkmeansCBI(data,k,scaling=TRUE,trim=0.1,...)
disttrimkmeansCBI(dmatrix,k,scaling=TRUE,trim=0.1,
mdsmethod="classical",
mdsdim=4,...)
dbscanCBI(data,eps,MinPts,diss=FALSE,...)
mahalCBI(data,clustercut=0.5,...)
claraCBI,
pamkCBI and dbscanCBI work with an
n*n-dissimilarity matrix as well, see parameter diss.dist-object.hclustCBI
and disthclustCBI see parameter cut below.scaling is a numeric
vector with length equal to the number of variables, then each
variable is divided by the corresponding value from cutree is used to obtain a partition from a hierarchy
tree. cut="level" means that the tree is cut at a particular
dissimilarity level, cut="number" means thclust.<=noisecut< code=""> in the
disthclustCBI/hclustCBI-partition are joined and declared as
noise/outliers.=noisecut<>minlevel=1 means that all clusters in
the tree are given out by hclusttreeCBI, including one-point
clusters (but excluding the cluster with all
points). minlevel=2 excludes the one-point clustersNNclean, which is used to estimate the
initial noise for noisemclustCBI and
distnoisemclustCBI. See parameter k in theEMclustN.TRUE, data will be considered as
a dissimilarity matrix. in claraCBI, this requires
usepam=TRUE.trimkmeans.dbscan.dbscan.fixmahal
is used for fuzzy clustering, a crisp partition is generated and
points with cluster membership values above clustercut are
considered as membenc includes the
noise component, and there should be another component
nccl, being the number of clusters not including the
noise component.n) for each cluster,
indicating whether a point is a member of this cluster
(TRUE) or not. If a noise component is included, it
should always be the last vector in this list.n,
partitioning the data. If the method produces a partition, it
should be the clustering. This component is only used for plots,
so you could do something like rep(1,n) for
non-partitioning methods.nc above.noisemclustCBI and distnoisemclustCBI,
see above.NNclean, called by noisemclustCBI
and distnoisemclustCBI.TRUE if points were classified as
noise/outliers by disthclustCBI.clusterboot. Here is a brief overview:
kmeansruns calling kmeans
for k-means clustering. (kmeansruns allows the
specification of several random initializations of the
k-means algorithm.)}
hclust for agglomerative hierarchical clustering with
noise component (see parameter noisecut above). This
function produces a partition and assumes a cases*variables
matrix as input.}
hclust for agglomerative hierarchical clustering. This
function gives out all clusters belonging to the hierarchy
(upward from a certain level, see parameter minlevel
above).}
hclust for agglomerative hierarchical clustering with
noise component (see parameter noisecut above). This
function produces a partition and assumes a dissimilarity
matrix as input.}
EMclust and
EMclustN, for normal mixture model based
clustering. Warning: EMclust and
EMclustN often have problems with multiple
points. In clusterboot, it is recommended to use
this only together with multipleboot=FALSE.
NOTE: the mclust package has recently been updated to 3.0.0.
noisemclustCBI at the moment
requires one of the previous versions of mclust. The newest one is now
available as mclust02 on CRAN. If you have an older version of
mclust installed, which is still named "mclust", you have to
change require(mclust02) in the function to
require(mclust).}
EMclust and
EMclustN, for normal mixture model based
clustering. This assumes a dissimilarity matrix as input and
generates a data matrix by multidimensional scaling first.
Warning: EMclust and
EMclustN often have problems with multiple
points. In clusterboot, it is recommended to use
this only together with multipleboot=FALSE.
NOTE: the mclust package has recently been updated to 3.0.0.
distnoisemclustCBI at the moment
requires one of the previous versions of mclust. The newest one is now
available as mclust02 on CRAN. If you have an older version of
mclust installed, which is still named "mclust", you have to
change require(mclust02) in the function to
require(mclust).}
pam and clara
for partitioning around medoids.}
pamk calling pam for
partitioning around medoids. The number
of clusters is estimated by the average silhouette width.}
trimkmeans for trimmed k-means
clustering. This assumes a cases*variables matrix as input.}
trimkmeans for trimmed k-means
clustering. This assumes a dissimilarity matrix as input and
generates a data matrix by multidimensional scaling first.}
dbscan for density based
clustering.}
fixmahal for fixed point
clustering.}clusterboot, dist,
kmeans, kmeansruns, hclust,
EMclust, EMclustN,
pam, pamk,
clara,
trimkmeans, dbscan,
fixmahalset.seed(20000)
face <- rFace(50,dMoNo=2,dNoEy=0,p=2)
dbs <- dbscanCBI(face,eps=1.5,MinPts=4)
dhc <- disthclustCBI(dist(face),method="average",k=1.5,noisecut=2)
table(dbs$partition,dhc$partition)Run the code above in your browser using DataLab