Learn R Programming

fpc (version 2.1-8)

pamk: Partitioning around medoids with estimation of number of clusters

Description

This calls the function pam or clara to perform a partitioning around medoids clustering with the number of clusters estimated by optimum average silhouette width (see pam.object) or Calinski-Harabasz index (calinhara). The Duda-Hart test (dudahart2) is applied to decide whether there should be more than one cluster (unless 1 is excluded as number of clusters or data are dissimilarities).

Usage

pamk(data,krange=2:10,criterion="asw", usepam=TRUE,
     scaling=FALSE, alpha=0.001, diss=inherits(data, "dist"),
     critout=FALSE, ns=10, seed=NULL, ...)

Arguments

data
a data matrix or data frame or something that can be coerced into a matrix, or dissimilarity matrix or object. See pam for more information.
krange
integer vector. Numbers of clusters which are to be compared by the average silhouette width criterion. Note: average silhouette width and Calinski-Harabasz can't estimate number of clusters nc=1. If 1 is included, a Duda-Hart tes
criterion
one of "asw", "multiasw" or "ch". Determines whether average silhouette width (as given out by pam/clara
usepam
logical. If TRUE, pam is used, otherwise clara (recommended for large datasets with 2,000 or more observations; dissimilarity ma
scaling
either a logical value or a numeric vector of length equal to the number of variables. If scaling is a numeric vector with length equal to the number of variables, then each variable is divided by the corresponding value from
alpha
numeric between 0 and 1, tuning constant for dudahart2 (only used for 1-cluster test).
diss
logical flag: if TRUE (default for dist or dissimilarity-objects), then data will be considered as a dissimilarity matrix (and the potential number of clusters 1 will be ignored). If F
critout
logical. If TRUE, the criterion value is printed out for every number of clusters.
ns
passed on to distcritmulti if criterion="multiasw".
seed
passed on to distcritmulti if criterion="multiasw".
...
further arguments to be transferred to pam or clara.

Value

  • A list with components
  • pamobjectThe output of the optimal run of the pam-function.
  • ncthe optimal number of clusters.
  • critvector of criterion values for numbers of clusters. crit[1] is the p-value of the Duda-Hart test if 1 is in krange and diss=FALSE.

References

Calinski, R. B., and Harabasz, J. (1974) A Dendrite Method for Cluster Analysis, Communications in Statistics, 3, 1-27.

Duda, R. O. and Hart, P. E. (1973) Pattern Classification and Scene Analysis. Wiley, New York.

Kaufman, L. and Rousseeuw, P.J. (1990). "Finding Groups in Data: An Introduction to Cluster Analysis". Wiley, New York.

See Also

pam, clara distcritmulti

Examples

Run this code
options(digits=3)
  set.seed(20000)
  face <- rFace(50,dMoNo=2,dNoEy=0,p=2)
  pk1 <- pamk(face,krange=1:5,criterion="asw",critout=TRUE)
  pk2 <- pamk(face,krange=1:5,criterion="multiasw",ns=2,critout=TRUE)
# "multiasw" is better for larger data sets, use larger ns then.
  pk3 <- pamk(face,krange=1:5,criterion="ch",critout=TRUE)

Run the code above in your browser using DataLab