Learn R Programming

clues (version 0.3.2)

clustering: Data Clustering (After Data Shrinking)

Description

Data clustering (after data shrinking).

Usage

clustering(y, disMethod = "Euclidean")

Arguments

y
data matrix which is a R matrix object (for dimension > 1) or vector object (for dimension=1) with rows be observations and columns be variables.
disMethod
specification of the dissimilarity measure. The available measures are Euclidean and 1-corr.

Value

  • memvector of the cluster membership of data points. The cluster member ship takes values: $1$, $2$, $\ldots$, $g$, where $g$ is the estimated number of clusters.
  • sizevector of the number of data points for clusters.
  • gan estimate of the number of clusters.
  • dbvector of dissimilarities between consecutive data points (c.f. details).
  • pointvector of consecutive data points (c.f. details).
  • ominThe minimum value of the outlier dissimilarities (c.f. details).

Details

We first store the first observation (data point) in point[1]. We then get the nearest neighbor of point[1]. Store it in point[2]. Store the dissimilarity between point[1] and point[2] to db[1]. We next remove point[1]. We then find the nearest neighbor of point[2]. Store it in point[3]. Store the dissimilarity between point[2] and point[3] to db[2]. We then remove point[2] and find the nearest neighbor of point[3]. We repeat this procudure until we find point[n] and db[n-1] where n is the total number of data points.

Next, we calculate the interquartile range (IQR) of the vector db. We then check which elements of db are larger than avg+1.5IQR where avg is the average of the vector db. The mininum value of these outlier dissimilarities will be stored in omin. An estimate of the number of clusters is g where g-1 is the number of the outlier dissimilarities. The position of an outlier dissimilarity indicates the end of a cluster and the start of a new cluster.

To get a reasonable clustering result, data sharpening (shrinking) is recommended before data clustering.

References

Wang, S., Qiu, W., and Zamar, R. H. (2007). CLUES: A non-parametric clustering method based on local shrinking. Computational Statistics & Data Analysis, Vol. 52, issue 1, pages 286-298.

See Also

shrinking

Examples

Run this code
# ruspini data
  data(Ruspini)
  # data matrix
  ruspini <- Ruspini$ruspini
  
  tt <- clustering(ruspini)
  plotClusters(ruspini, tt$mem)

Run the code above in your browser using DataLab