cluproxplot: Cluster Proximity Plot

Description

Visualizes cluster quality using an image plot of proximities (a dissimilarity matrix as, e.g., a dist objects) with reordered rows and columns. Reordering is done by seriation algorithms which try to place large similarities close to the diagonal. Good clusters are visible as dark squares (high similarity) on the diagonal of the plot. The visualization was inspired by CLUSION (see Strehl and Ghosh, 2002).

Usage

cluproxplot(x, labels = NULL, method = NULL, args = NULL, 
  clusterLabels = TRUE, averages = TRUE, lines = TRUE, silhouettes = TRUE,
  main = "Cluster proximity plot", 
  col = gray.colors(64, 0 , 1), colorkey = TRUE, linesCol = "black", 
  newpage = TRUE, pop = TRUE, ...)

Arguments

an object of class dist (distance) or a matrix.

labels

NULL or an integer vector of the same length as rows/columns in x indicating the membership for each element in x as consecutive integers starting with one. The labels are used to reorder the matrix

method

a vector of character strings indicating the used seriation algorithms. The first element indicates the inter-cluster and the second element the intra-cluster seriation method. See seriation

args

"list"; contains arguments passed on 
      to the seriation algorithms.

clusterLabels

"logical"; display cluster labels in the plot.

averages

"logical"; display in the lower triangle of the
      plot the average 
      pair-wise dissimilarity instead of the individual dissimilarities.

lines

"logical"; draw lines to separate clusters.

silhouettes

"logical"; include a silhouette plot 
	(see Rousseeuw, 1987).

main

title for the plot.

col

colors used for the image plot (default: 64 shades of gray).

colorkey

place a color key under the plot.

linesCol

color used for the lines to separate clusters.

newpage

logical"}; start plot on a new page. }
  item{pop}{code{logical";  should the viewports created be popped?

...

further arguments passed on to image.

`Value`

An invisible list of the folowing elements:
orderNULL or integer vector giving the order 
used to plot x.
methodvector of character strings indicating the seriation methods 
	used for plotting  x.
kNULL or integer scalar giving the number of clusters
generated.
descriptiona data.frame  
   containing information (label, size, average intra-cluster dissimilarity
   and the average silhouette) 
   for the clusters as displayed in the 
   plot (from top/left to bottom/right).

`References`

Alexander Strehl and Joydeep Ghosh. Relationship-based clustering and 
visualization for high-dimensional data mining. 
INFORMS Journal on Computing, pages 208-230, Spring 2003.
Rousseeuw, P.J. Silhouettes: A graphical aid to the 
interpretation and validation of cluster analysis. J. Comput. Appl. Math., 
20, 53-65, 1987.

`See Also`

dist
  (in package stats);
   package grid,
   seriation.

`Examples`

Run this codedata("Votes")

### create dummy coding (with removed party affiliation)
x <- as.dummy(Votes[-17])

### calculate distance matrix
d <- dists(x, method = "binary")

### plot dissimilarity matrix unseriated
res <- cluproxplot(d, method = "No seriation", main = "No seriation")

### plot matrix seriated
res <- cluproxplot(d, main = "Seriation - (Murtagh, 1985)")

### cluster with pam
library("cluster")
l <- pam(d, 8, cluster.only = TRUE)
res <- cluproxplot(d, l, main = "PAM + Seriation (Murtagh)")

res <- cluproxplot(d, l, method = c("Optimal", "Optimal"),
	main = "PAM + Seriation (Optimal Leaf ordering)")

### the result 
names(res)
res$description
Run the code above in your browser using DataLab