Learn R Programming

SillyPutty (version 0.4.2)

findClusterNumber: Using SillyPutty to find the number of clusters

Description

A function that is designed to find an approximation of the true number. K, of clusters in a dataset. the findClusterNumber function calls RandomSillyPutty for each value of K in the range from start to end, performing N random starts each time.

NOTE: start must be > 1, and the function can be slow depending on how complex the dataset is and the number of N iterations.

Usage

findClusterNumber(distobj, start,end, N = 100,
                    method = c("SillyPutty", "HCSP"), ...)

Value

A list containing the maximum silhouette width values per K clusters for each K in the range of possible cluster numbers.

Arguments

distobj

An object of class dist representing a distance matrix.

start

The minimum cluster number for the range of clusters

end

The maximum cluster number for the range of clusters

N

Number of iterations

method

whether to use the full RandomSillyPutty algorithm or use the hybrid method of hierarchical clustering followed by SillyPutty.

...

Extra arguments to the SillyPutty function.

Author

Kevin R. Coombes krc@silicovore.com, Dwayne G. Tally dtally110@hotmail.com

Details

The findClusterNumber function processes one distance matrix at a time, through N iterations. It returns a list. The list is a list of the maximum silhoutte width values obtained from N iterations with their associated cluster number.

References

Pending.

Examples

Run this code
data(eucdist)
set.seed(12)
y <- findClusterNumber(eucdist, start = 3, end = 7, method = "HCSP")
plot(names(y), y, xlab = "K", ylab = "Mean Silhouette Width",
     type = "b", lwd = 2, pch = 16)

Run the code above in your browser using DataLab