ksharp: sharpen a clustering

Description

Each data point in a clustering is assigned to a cluster, but some data points may lie in ambiguous zones between two or more clusters, or far from other points. Cluster sharpening assigns these border points into a separate noise group, thereby creating more stark distinctions between groups.

Usage

ksharp(
  x,
  threshold = 0.1,
  data = NULL,
  method = c("silhouette", "neighbor", "medoid"),
  threshold.abs = NULL
)

Arguments

clustering object; several types of inputs are acceptable, including objects of class kmeans, pam, and self-made lists with a component "cluster".

threshold

numeric; the fraction of points to place in noise group

data

matrix, raw data corresponding to clustering x; must be present when sharpening for the first time or if data is not present within x.

method

character, determines method used for sharpening

threshold.abs

numeric; absolute-value of threshold for sharpening. When non-NULL, this value overrides value in argument 'threshold'

Value

clustering object based on input x, with adjusted cluster assignments and additional list components with sharpness measures. Cluster assignments are placed in $cluster and excised data points are given a cluster index of 0. Original cluster assignments are saved in $cluster.original. Sharpness measures are stored in components $silinfo, $medinfo, and $neiinfo, although these details may change in future versions of the package.

Details

Noise points are assigned to a group with cluster index 0. This is analogous behavior to output produced by dbscan.

Examples

Run this code

# NOT RUN {
# prepare iris dataset for analysis
iris.data = iris[, 1:4]
rownames(iris.data) = paste0("iris_", seq_len(nrow(iris.data)))

# cluster the dataset into three groups
iris.clustered = kmeans(iris.data, centers=3)
table(iris.clustered$cluster)

# sharpen the clustering by excluding 10% of the data points
iris.sharp = ksharp(iris.clustered, threshold=0.1, data=iris.data)
table(iris.sharp$cluster)

# visualize cluster assignments
iris.pca = prcomp(iris.data)$x[,1:2]
plot(iris.pca, col=iris$Species, pch=ifelse(iris.sharp$cluster==0, 1, 19))

# }

Run the code above in your browser using DataLab