This calls the function pam or
  clara to perform a
  partitioning around medoids clustering with the number of clusters
  estimated by optimum average silhouette width (see
  pam.object) or Calinski-Harabasz
  index (calinhara). The Duda-Hart test
  (dudahart2) is applied to decide whether there should be
  more than one cluster (unless 1 is excluded as number of clusters or
  data are dissimilarities).
pamk(data,krange=2:10,criterion="asw", usepam=TRUE,
     scaling=FALSE, alpha=0.001, diss=inherits(data, "dist"),
     critout=FALSE, ns=10, seed=NULL, ...)a data matrix or data frame or something that can be
    coerced into a matrix, or dissimilarity matrix or
    object. See pam for more information.
integer vector. Numbers of clusters which are to be
    compared by the average silhouette width criterion. Note: average
    silhouette width and Calinski-Harabasz can't estimate number of
    clusters nc=1. If 1 is included, a Duda-Hart test is applied
    and 1 is estimated if this is not significant.
one of "asw", "multiasw" or
    "ch". Determines whether average silhouette width (as given
    out by  pam/clara, or
    as computed by distcritmulti if "multiasw" is
    specified; recommended for large data sets with usepam=FALSE)
    or Calinski-Harabasz is applied. Note that the original
    Calinski-Harabasz index is not defined for dissimilarities; if
    dissimilarity data is run with criterion="ch", the
    dissimilarity-based generalisation in Hennig and Liao (2013) is
    used.
either a logical value or a numeric vector of length
    equal to the number of variables. If scaling is a numeric
    vector with length equal to the number of variables, then each
    variable is divided by the corresponding value from scaling.
    If scaling is TRUE then scaling is done by dividing
    the (centered) variables by their root-mean-square, and if
    scaling is FALSE, no scaling is done.
numeric between 0 and 1, tuning constant for
    dudahart2 (only used for 1-cluster test).
logical flag: if TRUE (default for dist or
    dissimilarity-objects), then data will be considered
    as a dissimilarity matrix (and the potential number of clusters 1
    will be ignored).  If FALSE, then data will
    be considered as a matrix of observations by variables.
logical. If TRUE, the criterion value is printed
    out for every number of clusters.
passed on to distcritmulti if
    criterion="multiasw".
passed on to distcritmulti if
    criterion="multiasw".
A list with components
The output of the optimal run of the
    pam-function.
the optimal number of clusters.
vector of criterion values for numbers of
    clusters. crit[1] is the p-value of the Duda-Hart test
    if 1 is in krange and diss=FALSE.
Calinski, R. B., and Harabasz, J. (1974) A Dendrite Method for Cluster Analysis, Communications in Statistics, 3, 1-27.
Duda, R. O. and Hart, P. E. (1973) Pattern Classification and Scene Analysis. Wiley, New York.
Hennig, C. and Liao, T. (2013) How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification, Journal of the Royal Statistical Society, Series C Applied Statistics, 62, 309-369.
Kaufman, L. and Rousseeuw, P.J. (1990). "Finding Groups in Data: An Introduction to Cluster Analysis". Wiley, New York.
# NOT RUN {
  options(digits=3)
  set.seed(20000)
  face <- rFace(50,dMoNo=2,dNoEy=0,p=2)
  pk1 <- pamk(face,krange=1:5,criterion="asw",critout=TRUE)
  pk2 <- pamk(face,krange=1:5,criterion="multiasw",ns=2,critout=TRUE)
# "multiasw" is better for larger data sets, use larger ns then.
  pk3 <- pamk(face,krange=1:5,criterion="ch",critout=TRUE)
# }
Run the code above in your browser using DataLab