dissplot
Dissimilarity Plot
Visualizes a dissimilarity matrix using seriation and matrix shading using the method developed by Hahsler and Hornik (2011). Entries with lower dissimilarities (higher similarity) are plotted darker. Such a plot can be used to uncover hidden structure in the data.
The plot can also be used to visualize cluster quality (see Ling 1973). Objects belonging to the same cluster are displayed in consecutive order. The placement of clusters and the within cluster order is obtained by a seriation algorithm which tries to place large similarities/small dissimilarities close to the diagonal. Compact clusters are visible as dark squares (low dissimilarity) on the diagonal of the plot. Additionally, a Silhouette plot (Rousseeuw 1987) is added. This visualization is similar to CLUSION (see Strehl and Ghosh 2002), however, allows for using arbitrary seriating algorithms.
Usage
dissplot(x, labels = NULL, method = "Spectral",
control = NULL, options = NULL, …)
Arguments
 x
 an object of class
dist
.  labels

NULL
or an integer vector of the same length as rows/columns inx
indicating the cluster membership for each object inx
as consecutive integers starting with one. The labels are used to reorder the matrix.  method
 a list with up to three elements or a single character string.
Use a single character string to apply the same algorithm to reorder
the clusters (inter cluster seriation) as well as the objects within each
cluster (intra cluster seriation).
If separate algorithms for inter and intra cluster seriation are required,
method
can be alist
of two named elements (inter_cluster
andintra_cluster
each containing the name of the respective seriation method. Seeseriate.dist
for available algorithms.Set method to
NA
to plot the matrix as is (no or only coarse seriation). For intra cluster reordering the special method"silhouette width"
is available. Objects in clusters are then ordered by silhouette width (from silhouette plots). If nomethod
is given, the default method ofseriate.dist
is used.The third list element (named
aggregation
) controls how inter cluster dissimilarities are computed from from the given dissimilarity matrix. The choices are"avg"
(average pairwise dissimilarities; averagelink),"min"
(minimal pairwise dissimilarities; singlelink),"max"
(maximal pairwise dissimilarities; completelink), and"Hausdorff"
(pairs up each point from one cluster with the most similar point from the other cluster and then uses the largest dissimilarity of paired up points).  control
 a list of control options passed on to the seriation
algorithm.
In case of two different seriation algorithms,
control
can contain a list of two named elements (inter_cluster
andintra_cluster
) containing each a list with the control options for the respective algorithm.  options
 a list with options for plotting the matrix. The
list can contain the following elements:
plot
 a logical indicating if a plot should be produced. if
FALSE
, the returned object can be plotted later using the functionplot
which takes as the second argument a list of plotting options (seeoptions
below). cluster_labels
 a logical indicating whether to display cluster labels in the plot.
averages
 a logical vector of length two.
The first element controls the upper triangle and the second element
the lower triangle of the plot.
FALSE
displays the original dissimilarity between objects,TRUE
displays clusterwise average dissimilarities, andNA
leaves the triangle white (default:c(FALSE, TRUE)
, i.e., the lower triangle displays averages) lines
 a logical indicating whether to draw lines to separate clusters.
flip
 a logical indicating if the clusters are displayed
on the diagonal from
northwest to southeast (
FALSE
; default) or from northeast to southwest (TRUE
). silhouettes
 a logical indicating whether to include a silhouette plot (see Rousseeuw, 1987).
threshold
 a numeric. If used, only plot distances below the
threshold are displayed. Consider also using
zlim
for this purpose. col
 colors used for the image plot.
key
 a logical indicating whether to place a color key below the plot.
zlim
 range of values to display (defaults to range
x
). axes

"auto"
(default; enabled for less than 25 objects),"y"
or"none"
. main
 title for the plot.
newpage
 a logical indicating whether to start plot on a new
page (see
grid.newpage
in package grid). pop
 a logical indicating whether to pop the created viewports (see package grid)?
gp
,gp_lines
,gp_labels

objects of class
gpar
containing graphical parameters (seegpar
in package grid).
 …
 further arguments are added to
options
.
Value
An invisible object of class cluster_proximity_matrix
with the following
elements:
NULL
or integer vector giving the order used to plot
x
.NULL
or integer vector giving the order
of the clusters as plotted.x
.NULL
or integer scalar giving the number of clusters
generated.data.frame
containing information (label, size,
average intracluster dissimilarity and the average silhouette) for the
clusters as displayed in the plot (from top/left to bottom/right).This object can be used for plotting via
plot(x, options = NULL, ...)
, where x
is the
object and options
contains a list with plotting options (see above).
References
Hahsler, M. and Hornik, K. (2011): Dissimilarity plots: A visual exploration tool for partitional clustering. Journal of Computational and Graphical Statistics, 10(2):335354.
Ling, R.F. (1973): A computer generated aid for cluster analysis. Communications of the ACM, 16(6), 355361.
Rousseeuw, P.J. (1987): Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(1), 5365.
Strehl, A. and Ghosh, J. (2003): Relationshipbased clustering and visualization for highdimensional data mining. INFORMS Journal on Computing, 15(2), 208230.
See Also
Examples
data("iris")
d < dist(iris[5])
## plot original matrix
res < dissplot(d, method = NA)
## plot reordered matrix using the nearest insertion algorithm (from tsp)
res < dissplot(d, method = "TSP",
options = list(main = "Seriation (TSP)"))
## cluster with pam (we know iris has 3 clusters)
library("cluster")
l < pam(d, 3, cluster.only = TRUE)
## we use a grid layout to place several plots on a page
library("grid")
grid.newpage()
pushViewport(viewport(layout=grid.layout(nrow = 2, ncol = 2),
gp = gpar(fontsize = 8)))
pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 1))
## visualize the clustering (using Spectral between clusters and MDS within)
res < dissplot(d, l, method = list(inter = "Spectral", intra = "MDS"),
options = list(main = "PAM + Seriation  standard",
newpage = FALSE))
popViewport()
pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 2))
## more visualization options. Note that we reuse the reordered object res!
## color: use 10 shades redblue
plot(res, options = list(main = "PAM + Seriation",
col= bluered(10, bias=.5), newpage = FALSE))
popViewport()
pushViewport(viewport(layout.pos.row = 2, layout.pos.col = 1))
## threshold (using zlim) and cubic scale to highlight differences
plot(res, options = list(main = "PAM + Seriation  threshold",
zlim = c(0, 1.5), col = greys(100, power = 2), newpage = FALSE))
popViewport()
pushViewport(viewport(layout.pos.row = 2, layout.pos.col = 2))
## use custom (logistic) scale
plot(res, options = list(main = "PAM + Seriation  logistic scale",
col= hcl(c = 0, l = (plogis(seq(10, 0, length=100),
location = 2, scale = 1/2, log = FALSE))*100),
newpage = FALSE))
popViewport(2)
## the reordered_cluster_dissimilarity_matrix object
res
names(res)