clues: Clustering Method Based on Local Shrinking

Description

Automatically estimate the number of clusters for a given data set and get a partition.

Usage

clues(y, n0 = 5, alpha = 0.05, eps = 1.0e-4, itmax = 20, 
      K2.vec = n0, strengthMethod = "sil", strengthIni = -3, 
      disMethod ="Euclidean", plotFlag = FALSE, 
      plot.dim = c(1, 2), quiet = FALSE)

Arguments

data matrix which is a R matrix object (for dimension > 1) or vector object (for dimension=1) with rows being observations and columns being variables.

a guess for the number of clusters.

alpha

speed factor.

eps

a small positive number. A value is regarded as zero if it is less than eps.

itmax

maximum number of iterations allowed.

K2.vec

range for the number of nearest neighbors for the second pass of the iteration.

strengthMethod

specifies the prefered measure of the strength of the clusters (i.e., compactness of the clusters). Two available methods are sil (silhouette index) and CH (CH index).

strengthIni

initial value for the lower bound of the measure of the strength for the clusters. Any negative values will do.

disMethod

specification of the dissimilarity measure. The available measures are Euclidean and 1-corr.

plotFlag

logical. Indicates if a scatter plot of clusters should be output.

plot.dim

specifies the two dimensions to be plot.

quiet

logical. Indicates if intermediate results should be output.

Value

Knumber of nearest neighbors can be used to get final clustering.
sizevector of the number of data points for clusters.
memvector of the cluster membership of data points. The cluster member ship takes values: $1$, $2$, $\ldots$, $g$, where $g$ is the estimated number of clusters.
gan estimate of the number of clusters.
CHCH index value for the final partition if strengthMethod is CH.
avg.saverage of the Silhoutte index value for the final partition if strengthMethod is sil.
svector of Silhoutte indices for data points if strengthMethod is sil.
neighbornearest neighbor clusters for data points if strengthMethod is "sil".
K.vecnumber of nearest neighbors used for each iteration.
g.vecnumber of clusters obtained in each iteration.
myupdatelogical. Indicates if the partition obtained in the first pass is the same as that obtained in the second pass.
y.old1data used for shrinking and clustering.
y.old2data returned after shrinking and clustering.

References

Wang, S., Qiu, W., and Zamar, R. H. (2007). CLUES: A non-parametric clustering method based on local shrinking. Computational Statistics & Data Analysis, Vol. 52, issue 1, pages 286-298.

Examples

Run this code

# ruspini data
  data(Ruspini)
  # data matrix
  ruspini <- Ruspini$ruspini
    
  res <- clues(ruspini)

Run the code above in your browser using DataLab