spantree finds a minimum spanning tree
connecting all points, but disregarding dissimilarities that are at or
above the threshold or NA.
spantree(d, toolong = 0)
"as.hclust"(x, ...)
"cophenetic"(x)
spandepth(x)
"plot"(x, ord, cex = 0.7, type = "p", labels, dlim, FUN = sammon, ...)
"lines"(x, ord, display="sites", col = 1, ...)NA.
The function uses a fuzz factor, so
that dissimilarities close to the limit will be made NA, too.
If toolong = 0 (or negative), no dissimilarity is regarded
as too long.
spantree result object.scores.type="p" or type="b", or as text label with
type="t". The tree (lines) will always be plotted.type="t" or node names if this is
missing.cophenetic dissimilarity.FUN does not work,
supply ordination result as argument ord. scores used for ord.spantree
returns an object of class spantree which is a
list with two vectors, each of length $n-1$. The
number of links in a tree is one less the number of observations, and
the first item is omitted. The items arespantree finds a minimum spanning tree for
dissimilarities (there may be several minimum spanning trees, but the
function finds only one). Dissimilarities at or above the threshold
toolong and NAs are disregarded, and the spanning tree
is found through other dissimilarities. If the data are disconnected,
the function will return a disconnected tree (or a forest), and the
corresponding link is NA. Connected subtrees can be identified
using distconnected. Minimum spanning tree is closesly related to single linkage
clustering, a.k.a. nearest neighbour clustering, and in genetics as
neighbour joining tree available in hclust and
agnes functions. The most important practical
difference is that minimum spanning tree has no concept of cluster
membership, but always joins individual points to each other. Function
as.hclust can change the spantree result into a
corresponding hclust object.
Function cophenetic finds distances between all points along
the tree segments. Function spandepth returns the depth of
each node. The nodes of a tree are either leaves (with one link) or
internal nodes (more than one link). The leaves are recursively
removed from the tree, and the depth is the layer at with the leaf
was removed. In disconnected spantree object (in a forest)
each tree is analysed separately and disconnected nodes not in any
tree have depth zero.
Function plot displays the tree over a
supplied ordination configuration, and lines adds a spanning
tree to an ordination graph. If configuration is not supplied for plot,
the function ordinates the cophenetic dissimilarities of the
spanning tree and overlays the tree on this result. The default
ordination function is sammon (package MASS),
because Sammon scaling emphasizes structure in the neighbourhood of
nodes and may be able to beautifully represent the tree (you may need
to set dlim, and sometimes the results will remain
twisted). These ordination methods do not work with disconnected
trees, but you must supply the ordination configuration. Function
lines will overlay the tree in an existing plot.
Function spantree uses Prim's method
implemented as priority-first search for dense graphs (Sedgewick
1990). Function cophenetic uses function
stepacross with option path = "extended". The
spantree is very fast, but cophenetic is slow in very
large data sets.
vegdist or dist for getting
dissimilarities, and hclust or
agnes for single linkage clustering.
data(dune)
dis <- vegdist(dune)
tr <- spantree(dis)
## Add tree to a metric scaling
plot(tr, cmdscale(dis), type = "t")
## Find a configuration to display the tree neatly
plot(tr, type = "t")
## Depths of nodes
depths <- spandepth(tr)
plot(tr, type = "t", label = depths)
## Plot as a dendrogram
cl <- as.hclust(tr)
plot(cl)
## cut hclust tree to classes and show in colours in spantree
plot(tr, col = cutree(cl, 5), pch=16)
Run the code above in your browser using DataLab