maptree (version 1.4-7)

kgs: KGS Measure for Pruning Hierarchical Clusters

Description

Computes the Kelley-Gardner-Sutcliffe penalty function for a hierarchical cluster tree.

Usage

kgs (cluster, diss, alpha=1, maxclust=NULL)

Arguments

cluster

object of class hclust or twins.

diss

object of class dissimilarity or dist.

alpha

weight for number of clusters.

maxclust

maximum number of clusters for which to compute measure.

Value

Vector of the penalty function for trees of size 2:maxclust. The names of vector elements are the respective numbers of clusters.

Details

Kelley et al. (see reference) proposed a method that can help decide where to prune a hierarchical cluster tree. At any level of the tree the mean across all clusters of the mean within clusters of the dissimilarity measure is calculated. After normalizing, the number of clusters times alpha is added. The minimum of this function corresponds to the suggested pruning size.

The current implementation has complexity O(n*n*maxclust), thus very slow with large n. For improvements, at least it should only calculate the spread for clusters that are split at each level, rather than over again for all.

References

Kelley, L.A., Gardner, S.P., Sutcliffe, M.J. (1996) An automated approach for clustering an ensemble of NMR-derived protein structures into conformationally-related subfamilies, Protein Engineering, 9, 1063-1065.

See Also

twins.object, dissimilarity.object, hclust, dist, clip.clust,

Examples

Run this code
# NOT RUN {
  library (cluster)
  data (votes.repub)

  a <- agnes (votes.repub, method="ward")
  b <- kgs (a, a$diss, maxclust=20)
  plot (names (b), b, xlab="# clusters", ylab="penalty")
# }

Run the code above in your browser using DataCamp Workspace