clustering: Data Clustering (After Data Shrinking)

Description

Data clustering (after data shrinking).

Usage

clustering(y, disMethod = "Euclidean")

Arguments

data matrix which is a R matrix object (for dimension > 1) or vector object (for dimension=1) with rows be observations and columns be variables.

disMethod

specification of the dissimilarity measure. The available measures are Euclidean and 1-corr.

Value

memvector of the cluster membership of data points. The cluster member ship takes values: $1$, $2$, $\ldots$, $g$, where $g$ is the estimated number of clusters.
sizevector of the number of data points for clusters.
gan estimate of the number of clusters.
dbvector of dissimilarities between consecutive data points (c.f. details).
pointvector of consecutive data points (c.f. details).
ominThe minimum value of the outlier dissimilarities (c.f. details).

Details

We first store the first observation (data point) in point[1]. We then get the nearest neighbor of point[1]. Store it in point[2]. Store the dissimilarity between point[1] and point[2] to db[1]. We next remove point[1]. We then find the nearest neighbor of point[2]. Store it in point[3]. Store the dissimilarity between point[2] and point[3] to db[2]. We then remove point[2] and find the nearest neighbor of point[3]. We repeat this procudure until we find point[n] and db[n-1] where n is the total number of data points.

Next, we calculate the interquartile range (IQR) of the vector db. We then check which elements of db are larger than avg+1.5IQR where avg is the average of the vector db. The mininum value of these outlier dissimilarities will be stored in omin. An estimate of the number of clusters is g where g-1 is the number of the outlier dissimilarities. The position of an outlier dissimilarity indicates the end of a cluster and the start of a new cluster.

To get a reasonable clustering result, data sharpening (shrinking) is recommended before data clustering.

References

Wang, S., Qiu, W., and Zamar, R. H. (2007). CLUES: A non-parametric clustering method based on local shrinking. Computational Statistics & Data Analysis, Vol. 52, issue 1, pages 286-298.

Examples

Run this code

# ruspini data
  data(Ruspini)
  # data matrix
  ruspini <- Ruspini$ruspini
  
  tt <- clustering(ruspini)
  plotClusters(ruspini, tt$mem)

Run the code above in your browser using DataLab