pamk
Partitioning around medoids with estimation of number of clusters
This calls the function pam
or
clara
to perform a
partitioning around medoids clustering with the number of clusters
estimated by optimum average silhouette width (see
pam.object
) or CalinskiHarabasz
index (calinhara
). The DudaHart test
(dudahart2
) is applied to decide whether there should be
more than one cluster (unless 1 is excluded as number of clusters or
data are dissimilarities).
 Keywords
 multivariate, cluster
Usage
pamk(data,krange=2:10,criterion="asw", usepam=TRUE, scaling=FALSE, alpha=0.001, diss=inherits(data, "dist"), critout=FALSE, ns=10, seed=NULL, ...)
Arguments
 data
 a data matrix or data frame or something that can be
coerced into a matrix, or dissimilarity matrix or
object. See
pam
for more information.  krange
 integer vector. Numbers of clusters which are to be
compared by the average silhouette width criterion. Note: average
silhouette width and CalinskiHarabasz can't estimate number of
clusters
nc=1
. If 1 is included, a DudaHart test is applied and 1 is estimated if this is not significant.  criterion
 one of
"asw"
,"multiasw"
or"ch"
. Determines whether average silhouette width (as given out bypam
/clara
, or as computed bydistcritmulti
if"multiasw"
is specified; recommended for large data sets withusepam=FALSE
) or CalinskiHarabasz is applied. Note that the original CalinskiHarabasz index is not defined for dissimilarities; if dissimilarity data is run withcriterion="ch"
, the dissimilaritybased generalisation in Hennig and Liao (2013) is used.  usepam
 logical. If
TRUE
,pam
is used, otherwiseclara
(recommended for large datasets with 2,000 or more observations; dissimilarity matrices can not be used withclara
).  scaling
 either a logical value or a numeric vector of length
equal to the number of variables. If
scaling
is a numeric vector with length equal to the number of variables, then each variable is divided by the corresponding value fromscaling
. Ifscaling
isTRUE
then scaling is done by dividing the (centered) variables by their rootmeansquare, and ifscaling
isFALSE
, no scaling is done.  alpha
 numeric between 0 and 1, tuning constant for
dudahart2
(only used for 1cluster test).  diss
 logical flag: if
TRUE
(default fordist
ordissimilarity
objects), thendata
will be considered as a dissimilarity matrix (and the potential number of clusters 1 will be ignored). IfFALSE
, thendata
will be considered as a matrix of observations by variables.  critout
 logical. If
TRUE
, the criterion value is printed out for every number of clusters.  ns
 passed on to
distcritmulti
ifcriterion="multiasw"
.  seed
 passed on to
distcritmulti
ifcriterion="multiasw"
.  ...
 further arguments to be transferred to
pam
orclara
.
Value

A list with components
 pamobject
 The output of the optimal run of the
pam
function.  nc
 the optimal number of clusters.
 crit
 vector of criterion values for numbers of
clusters.
crit[1]
is the pvalue of the DudaHart test if 1 is inkrange
anddiss=FALSE
.
Note
clara
and pam
can handle NA
entries (see their documentation) but
dudahart2
cannot. Therefore NA
should not occur
if 1 is in krange
.
References
Calinski, R. B., and Harabasz, J. (1974) A Dendrite Method for Cluster Analysis, Communications in Statistics, 3, 127.
Duda, R. O. and Hart, P. E. (1973) Pattern Classification and Scene Analysis. Wiley, New York.
Hennig, C. and Liao, T. (2013) How to find an appropriate clustering for mixedtype variables with application to socioeconomic stratification, Journal of the Royal Statistical Society, Series C Applied Statistics, 62, 309369.
Kaufman, L. and Rousseeuw, P.J. (1990). "Finding Groups in Data: An Introduction to Cluster Analysis". Wiley, New York.
See Also
Examples
options(digits=3)
set.seed(20000)
face < rFace(50,dMoNo=2,dNoEy=0,p=2)
pk1 < pamk(face,krange=1:5,criterion="asw",critout=TRUE)
pk2 < pamk(face,krange=1:5,criterion="multiasw",ns=2,critout=TRUE)
# "multiasw" is better for larger data sets, use larger ns then.
pk3 < pamk(face,krange=1:5,criterion="ch",critout=TRUE)