Minimum Spanning Tree
spantree finds a minimum spanning tree
connecting all points, but disregarding dissimilarities that are at or
above the threshold or
spantree(d, toolong = 0) "as.hclust"(x, ...) "cophenetic"(x) spandepth(x) "plot"(x, ord, cex = 0.7, type = "p", labels, dlim, FUN = sammon, ...) "lines"(x, ord, display="sites", col = 1, ...)
- Dissimilarity data inheriting from class
distor a an object, such as a matrix, that can be converted to a dissimilarity matrix. Functions
distare some functions producing suitable dissimilarity data.
- Shortest dissimilarity regarded as
NA. The function uses a fuzz factor, so that dissimilarities close to the limit will be made
NA, too. If
toolong = 0(or negative), no dissimilarity is regarded as too long.
- An ordination configuration, or an ordination result known
- Character expansion factor.
- Observations are plotted as points with
type="b", or as text label with
type="t". The tree (lines) will always be plotted.
- Text used with
type="t"or node names if this is missing.
- A ceiling value used to highest
- Ordination function to find the configuration from
cophenetic dissimilarities. If the supplied
FUNdoes not work, supply ordination result as argument
- Type of
- Colour of line segments. This can be a vector which is recycled for points, and the line colour will be a mixture of two joined points.
- Other parameters passed to functions.
spantree finds a minimum spanning tree for
dissimilarities (there may be several minimum spanning trees, but the
function finds only one). Dissimilarities at or above the threshold
NAs are disregarded, and the spanning tree
is found through other dissimilarities. If the data are disconnected,
the function will return a disconnected tree (or a forest), and the
corresponding link is
NA. Connected subtrees can be identified
Minimum spanning tree is closesly related to single linkage
clustering, a.k.a. nearest neighbour clustering, and in genetics as
neighbour joining tree available in
agnes functions. The most important practical
difference is that minimum spanning tree has no concept of cluster
membership, but always joins individual points to each other. Function
as.hclust can change the
spantree result into a
cophenetic finds distances between all points along
the tree segments. Function
spandepth returns the depth of
each node. The nodes of a tree are either leaves (with one link) or
internal nodes (more than one link). The leaves are recursively
removed from the tree, and the depth is the layer at with the leaf
was removed. In disconnected
spantree object (in a forest)
each tree is analysed separately and disconnected nodes not in any
tree have depth zero.
plot displays the tree over a
supplied ordination configuration, and
lines adds a spanning
tree to an ordination graph. If configuration is not supplied for
the function ordinates the cophenetic dissimilarities of the
spanning tree and overlays the tree on this result. The default
ordination function is
sammon (package MASS),
because Sammon scaling emphasizes structure in the neighbourhood of
nodes and may be able to beautifully represent the tree (you may need
dlim, and sometimes the results will remain
twisted). These ordination methods do not work with disconnected
trees, but you must supply the ordination configuration. Function
lines will overlay the tree in an existing plot.
spantree uses Prim's method
implemented as priority-first search for dense graphs (Sedgewick
cophenetic uses function
stepacross with option
path = "extended". The
spantree is very fast, but
cophenetic is slow in very
large data sets.
spantreereturns an object of class
spantreewhich is a list with two vectors, each of length $n-1$. The number of links in a tree is one less the number of observations, and the first item is omitted. The items are
In principle, minimum spanning tree is equivalent to single linkage
clustering that can be performed using
agnes. However, these functions combine
clusters to each other and the information of the actually connected points
(the ``single link'') cannot be recovered from the result. The
graphical output of a single linkage clustering plotted with
ordicluster will look very different from an equivalent
spanning tree plotted with
Sedgewick, R. (1990). Algorithms in C. Addison Wesley.
data(dune) dis <- vegdist(dune) tr <- spantree(dis) ## Add tree to a metric scaling plot(tr, cmdscale(dis), type = "t") ## Find a configuration to display the tree neatly plot(tr, type = "t") ## Depths of nodes depths <- spandepth(tr) plot(tr, type = "t", label = depths) ## Plot as a dendrogram cl <- as.hclust(tr) plot(cl) ## cut hclust tree to classes and show in colours in spantree plot(tr, col = cutree(cl, 5), pch=16)