clusplot.default: bivariate clusplot

Description

Creates a bivariate plot visualizing a partition (clustering) of the data. All observation are represented by points in the plot, using principal components or multidimensional scaling. Around each cluster an ellipse is drawn.

Usage

clusplot.default(x, clus, diss=F, cor=T, stand=F, lines=2, shade=F, 
         color=F, labels=0, plotchar=T, span=T, ...)

Arguments

data matrix or dataframe, or dissimilarity matrix, depending on the value of the diss argument.

In case of a data matrix or dataframe, each row corresponds to an observation, and each column corresponds to a variable. All variables must be

clus

a vector of length n representing a clustering of x. For each observation the vector lists the number or name of the cluster to which it has been assigned. clus is often the clustering component of the output of pam, fanny<

diss

logical flag: if TRUE, then x will be considered as a dissimilarity matrix. If FALSE, then x will be considered as a matrix of observations by variables.

cor

logical flag: this is only important when working with a data matrix or dataframe. If TRUE, then the variables are scaled to have unit variance.

stand

logical flag: if TRUE, then the representations of the n observations in the 2-dimensional plot are standardized.

lines

integer: the currently available options are 0,1 and 2. This option is used to obtain an idea of the distances between ellipses. The distance between two ellipses E1 and E2 is measured along the line connecting the centers m1 and m2 of the two ellipses.

shade

logical flag: if TRUE, then the ellipses are shaded in relation to their density. The density is the number of points in the cluster divided by the area of the ellipse.

color

logical flag: if TRUE, then the ellipses are colored with respect to their density. With increasing density, the colors are light blue, light green, red and purple. To see these colors on the graphics device, an appropriate color scheme should be selected

labels

integer: the currently available options are 0,1,2,3 and 4.

If labels=0, then no labels are placed in the plot.

Using labels=1, points and ellipses can be identified in the plot (see identify).

If labels=2, then all points and ellipses

plotchar

logical flag: if TRUE, then the plotting symbols differ for points belonging to different clusters.

span

logical flag: if TRUE, then each cluster is represented by the ellipse with smallest area containing all its points. (This is a special case of the minimum volume ellipsoid.) If FALSE, the ellipse is based on the average and covariance matrix of the sam

Distances

When option lines is 1 or 2 we optain a k by k matrix (k is the number of clusters). The element at row j and column s is the distance between ellipse j and ellipse s. If lines=0, then the value of this component is NA.

Shading

A vector of length k (where k is the number of clusters), containing the amount of shading per cluster. Let y be a vector where element i is the ratio between the number of points in cluster i and the area of ellipse i. When the cluster i is a line segme

Side Effects

a visual display of the clustering is plotted on the current graphics device.

NOTE

When we have 4 or fewer clusters, then the option color=T gives every cluster a different color. When there are more than 4 clusters, clusplot uses the function pam from library(cluster) to cluster the densities into 4 groups, such that ellipses with nearly the same density get the same color.

Details

clusplot uses the functions princomp and cmdscale. These functions are data reduction techniques. They will represent the data in a bivariate plot. Ellipses are then drawn to indicate the clusters. The further layout of the plot is determined by the optional arguments.

References

Kaufman, L. and Rousseeuw, P.J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.

Pison, G., Struyf, A. and Rousseeuw, P.J. (1997). Displaying a Clustering with CLUSPLOT, Technical Report, University of Antwerp, submitted.

Struyf, A., Hubert, M. and Rousseeuw, P.J. (1997). Integrating Robust Clustering Techniques in S-PLUS, Computational Statistics and Data Analysis, 26, 17-37.

Examples

Run this code

## plotting votes.diss(dissimilarity) in a bivariate plot and
## partitioning into 2 clusters
data(votes.repub)
votes.diss <- daisy(votes.repub)
clusplot(votes.diss, pam(votes.diss, 2, diss = TRUE)$clustering,
         diss = TRUE, shade = TRUE, plotchar = TRUE, labels = 1)

## plotting iris (dataframe) in a 2-dimensional plot and partitioning
## into 3 clusters.
data(iris3)
iris.x <- rbind(iris3[,,1], iris3[,,2], iris3[,,3])
clusplot(iris.x, pam(iris.x, 3)$clustering, diss = FALSE,
         plotchar = TRUE, color = TRUE)

Run the code above in your browser using DataLab