inckmed: Increasing number of clusters in k-medoids algorithm

Description

This function runs the increasing number of clusters in the k-medoids algorithm proposed by Yu et. al. (2018).

Usage

inckmed(distdata, ncluster, iterate = 10, alpha = 1)

Value

Function returns a list of components:

cluster is the clustering memberships result.

medoid is the id medoids.

minimum_distance is the distance of all objects to their cluster medoid.

Arguments

distdata: A distance matrix (n x n) or dist object.
ncluster: A number of clusters.
iterate: A number of iterations for the clustering algorithm.
alpha: A stretch factor to determine the range of initial medoid selection (see Details).

Author

Weksi Budiaji
Contact: budiaji@untirta.ac.id

Details

This algorithm is claimed to manage with the weakness of the simple and fast-kmedoids (fastkmed). The origin of the algorithm is a centroid-based algorithm by applying the Euclidean distance. Then, Bbecause the function is a medoid-based algorithm, the object mean (centroid) and variance are redefined into medoid and deviation, respectively.

The alpha argument is a stretch factor, i.e. a constant defined by the user. It is applied to determine a set of medoid candidates. The medoid candidates are calculated by $O_c = ${$X_i$| $\sigma_i \leq \alpha \sigma, i = 1, 2, \ldots, n$ }, where $\sigma_i$ is the average deviation of object i, and $\sigma$ is the average deviation of the data set. They are computed by $$\sigma = \sqrt{\frac{1}{n-1} \sum_{i=1}^n d(O_i, v_1)}$$ $$\sigma_i = \sqrt{\frac{1}{n-1} \sum_{i=1}^n d(O_i, O_j)}$$ where n is the number of objects, $O_i$ is the object i, and $v_1$ is the most centrally located object.

References

Yu, D., Liu, G., Guo, M., Liu, X., 2018. An improved K-medoids algorithm based on step increasing and optimizing medoids. Expert Systems with Applications 92, pp. 464-473.

Examples

Run this code

num <- as.matrix(iris[,1:4])
mrwdist <- distNumeric(num, num, method = "mrw")
result <- inckmed(mrwdist, ncluster = 3, iterate = 50, alpha = 1.5)
table(result$cluster, iris[,5])

Run the code above in your browser using DataLab