plot.partition: Plot of a Partition of the Data Set

Description

Creates plots for visualizing a partition object.

Usage

plot.partition(x, ask=F, ...)

Arguments

an object of class "partition", e.g. created by the functions pam, clara, and fanny.

ask

if TRUE, plot.partition operates in interactive mode.

All optional arguments available to the function clusplot.default (except for the diss option) may also be supplied to this function. Graphical parameters (see

Value

a NULL value is returned.

Side Effects

An appropriate plot is produced on the current graphics device. This can be one or both of the following choices: Clusplot Silhouette plot

NOTE

In the silhouette plot, observation labels are only printed when the number of observations is limited to less than 40, for readability. Moreover, observation labels are truncated to at most 5 characters.

Details

When ask=T, rather than producing each plot sequentially, plot.partition displays a menu listing all the plots that can be produced. If the menu is not desired but a pause between plots is still wanted one must set par(ask=T) before invoking the plot command.

The clusplot of a cluster partition consists of a two-dimensional representation of the observations, in which the clusters are indicated by ellipses. (See clusplot.partition for more details.)

The silhouette plot of a nonhierarchical clustering is fully described in Rousseeuw (1987) and in chapter 2 of Kaufman and Rousseeuw (1990). For each observation i, a bar is drawn, representing the silhouette width s(i) of the observation. Observations are grouped per cluster, starting with cluster 1 at the top. Observations with a large s(i) (almost 1) are very well clustered, a small s(i) (around 0) means that the observation lies between two clusters, and observations with a negative s(i) are probably placed in the wrong cluster. A clustering can be performed for several values of k (the number of clusters). Finally, choose the value of k with the largest overall average silhouette width.

The silhouette width is computed as follows: Put a(i) = average dissimilarity between i and all other points of the cluster to which i belongs. For all clusters C, put d(i,C) = average dissimilarity of i to all observations of C. The smallest of these d(i,C) is denoted as b(i), and can be seen as the dissimilarity between i and its neighbor cluster. Finally, put s(i) = ( b(i) - a(i) ) / max( a(i), b(i) ). The overall average silhouette width is then simply the average of s(i) over all observations i.

References

Kaufman, L. and Rousseeuw, P.J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.

Rousseeuw, P.J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20, 53-65.

Struyf, A., Hubert, M. and Rousseeuw, P.J. (1997). Integrating Robust Clustering Techniques in S-PLUS, Computational Statistics and Data Analysis, 26, 17-37.

Examples

Run this code

## generate 25 objects, divided into 2 clusters.
x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)),
           cbind(rnorm(15,5,0.5), rnorm(15,5,0.5)))
plot(pam(x, 2))

Run the code above in your browser using DataLab