Learn R Programming

GrammR (version 1.1.0)

OptimClusts: Optimal Cluster Calculator

Description

Given the average silhouette width obtained using partitioning around medoids(PAM) method, this function determines the optimal number of clusters to be used by calculating the maximum average silhouette width. The absolute maximum silhouette width is not a representative of the optimal number of clusters. OptimClusts calculates the optimal number as the smallest value such that the silhouette width at that value is a local maxima, and is within a neighbourhood of the global maxima.

Usage

OptimClusts(P, Eps)

Arguments

P
Vector of average silhouette widths calculated for a specified number of clusters.
Eps
A numerical value between 0 and 1 which determines the neighbourhood of the global maximum within which to search for a local maxima. It is advised to use values smaller than 10 %.

Value

An integer value between 1 and $K$, where $K$ is the length of the silhouette vector $P$. If the minimum and maximum number of clusters specified are $m$ and $M$ respectively, the value represents the index of the optimal number of clusters to be used in the vector $ (m, M)$. See Details for information on the maximum number of clusters.

Details

The function OptimClusts uses the mPAM (modified PAM) algorithm described in the first reference below. For a data set with $N$ samples (or taxa/OTUs when clustering taxa/OTUs), the value of $K$ to be used to avoid overestimation of clusters is $\left[ 2\sqrt{N} \right]$, where $\left[x \right]$ is the largest integer smaller than $x$.

References

Ayyala, D. N., Lin, S., (2015) GrammR: graphical representation and modeling of count data with application in metagenomics, Bioinformatics, 31(10).

Peter J. Rousseeuw (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20.

Examples

Run this code
x <- c(0.5, 0.1, 0.6, 0.7, 0.8, 0.75, 0.77, 0.79, 0.81, 0.9)
## Not run: plot(2:10, x)
OptimClusts(x, 0.1) ## The optimal number selected is 6.
OptimClusts(x, 0.05) ## The optimal number selected is 10.

Run the code above in your browser using DataLab