This calls the function `pam`

or
`clara`

to perform a
partitioning around medoids clustering with the number of clusters
estimated by optimum average silhouette width (see
`pam.object`

) or Calinski-Harabasz
index (`calinhara`

). The Duda-Hart test
(`dudahart2`

) is applied to decide whether there should be
more than one cluster (unless 1 is excluded as number of clusters or
data are dissimilarities).

```
pamk(data,krange=2:10,criterion="asw", usepam=TRUE,
scaling=FALSE, alpha=0.001, diss=inherits(data, "dist"),
critout=FALSE, ns=10, seed=NULL, ...)
```

A list with components

- pamobject
The output of the optimal run of the

`pam`

-function.- nc
the optimal number of clusters.

- crit
vector of criterion values for numbers of clusters.

`crit[1]`

is the p-value of the Duda-Hart test if 1 is in`krange`

and`diss=FALSE`

.

- data
a data matrix or data frame or something that can be coerced into a matrix, or dissimilarity matrix or object. See

`pam`

for more information.- krange
integer vector. Numbers of clusters which are to be compared by the average silhouette width criterion. Note: average silhouette width and Calinski-Harabasz can't estimate number of clusters

`nc=1`

. If 1 is included, a Duda-Hart test is applied and 1 is estimated if this is not significant.- criterion
one of

`"asw"`

,`"multiasw"`

or`"ch"`

. Determines whether average silhouette width (as given out by`pam`

/`clara`

, or as computed by`distcritmulti`

if`"multiasw"`

is specified; recommended for large data sets with`usepam=FALSE`

) or Calinski-Harabasz is applied. Note that the original Calinski-Harabasz index is not defined for dissimilarities; if dissimilarity data is run with`criterion="ch"`

, the dissimilarity-based generalisation in Hennig and Liao (2013) is used.- usepam
logical. If

`TRUE`

,`pam`

is used, otherwise`clara`

(recommended for large datasets with 2,000 or more observations; dissimilarity matrices can not be used with`clara`

).- scaling
either a logical value or a numeric vector of length equal to the number of variables. If

`scaling`

is a numeric vector with length equal to the number of variables, then each variable is divided by the corresponding value from`scaling`

. If`scaling`

is`TRUE`

then scaling is done by dividing the (centered) variables by their root-mean-square, and if`scaling`

is`FALSE`

, no scaling is done.- alpha
numeric between 0 and 1, tuning constant for

`dudahart2`

(only used for 1-cluster test).- diss
logical flag: if

`TRUE`

(default for`dist`

or`dissimilarity`

-objects), then`data`

will be considered as a dissimilarity matrix (and the potential number of clusters 1 will be ignored). If`FALSE`

, then`data`

will be considered as a matrix of observations by variables.- critout
logical. If

`TRUE`

, the criterion value is printed out for every number of clusters.- ns
passed on to

`distcritmulti`

if`criterion="multiasw"`

.- seed
passed on to

`distcritmulti`

if`criterion="multiasw"`

.- ...

Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/

Calinski, R. B., and Harabasz, J. (1974) A Dendrite Method for Cluster
Analysis, *Communications in Statistics*, 3, 1-27.

Duda, R. O. and Hart, P. E. (1973) *Pattern Classification and
Scene Analysis*. Wiley, New York.

Hennig, C. and Liao, T. (2013) How to find an appropriate clustering
for mixed-type variables with application to socio-economic
stratification, *Journal of the Royal Statistical Society, Series
C Applied Statistics*, 62, 309-369.

Kaufman, L. and Rousseeuw, P.J. (1990). "Finding Groups in Data: An Introduction to Cluster Analysis". Wiley, New York.

```
options(digits=3)
set.seed(20000)
face <- rFace(50,dMoNo=2,dNoEy=0,p=2)
pk1 <- pamk(face,krange=1:5,criterion="asw",critout=TRUE)
pk2 <- pamk(face,krange=1:5,criterion="multiasw",ns=2,critout=TRUE)
# "multiasw" is better for larger data sets, use larger ns then.
pk3 <- pamk(face,krange=1:5,criterion="ch",critout=TRUE)
```

Run the code above in your browser using DataLab