
Last chance! 50% off unlimited learning
Sale ends in
Hierarchical cluster analysis.
hcluster(x, method = "euclidean", diag = FALSE, upper = FALSE,
link = "complete", members = NULL, nbproc = 2,
doubleprecision = TRUE)
An object of class hclust which describes the tree produced by the clustering process. The object is a list with components:
an merge
describes the merging of clusters
at step merge
indicate agglomerations
of singletons, and positive entries indicate agglomerations
of non-singletons.
a set of method
for the particular agglomeration.
a vector giving the permutation of the original
observations suitable for plotting, in the sense that a cluster
plot using this ordering and matrix merge
will not have
crossings of the branches.
labels for each of the objects being clustered.
the call which produced the result.
the cluster method that has been used.
the distance that has been used to create d
(only returned if the distance object has a "method"
attribute).
There is a print
and a plot
method for
hclust
objects.
The plclust()
function is basically the same as the plot method,
plot.hclust
, primarily for back compatibility with S-plus. Its
extra arguments are not yet implemented.
A numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns). Or an object of class "exprSet".
the distance measure to be used. This must be one of
"euclidean"
, "maximum"
, "manhattan"
,
"canberra"
, "binary"
, "pearson"
,
"abspearson"
, "correlation"
,
"abscorrelation"
, "spearman"
or "kendall"
.
Any unambiguous substring can be given.
logical value indicating whether the diagonal of the
distance matrix should be printed by print.dist
.
logical value indicating whether the upper triangle of the
distance matrix should be printed by print.dist
.
the agglomeration method to be used. This should
be (an unambiguous abbreviation of) one of
"ward"
, "single"
, "complete"
,
"average"
, "mcquitty"
, "median"
or
"centroid"
,"centroid2"
.
NULL
or a vector with length size of d
.
integer, number of subprocess for parallelization [Linux & Mac only]
True: use of double precision for distance matrix computation; False: use simple precision
The hcluster
function is based on C code adapted from Cran
Fortran routine
by Antoine Lucas.
This function is a mix of function hclust
and function
dist
. hcluster(x, method = "euclidean",link = "complete")
= hclust(dist(x, method = "euclidean"),method = "complete"))
It use twice less memory, as it doesn't store distance matrix.
For more details, see documentation of hclust
and Dist
.
Antoine Lucas and Sylvain Jasson, Using amap and ctc Packages for Huge Clustering, R News, 2006, vol 6, issue 5 pages 58-60.
data(USArrests)
hc <- hcluster(USArrests,link = "ave")
plot(hc)
plot(hc, hang = -1)
## Do the same with centroid clustering and squared Euclidean distance,
## cut the tree into ten clusters and reconstruct the upper part of the
## tree from the cluster centers.
hc <- hclust(dist(USArrests)^2, "cen")
memb <- cutree(hc, k = 10)
cent <- NULL
for(k in 1:10){
cent <- rbind(cent, colMeans(USArrests[memb == k, , drop = FALSE]))
}
hc1 <- hclust(dist(cent)^2, method = "cen", members = table(memb))
opar <- par(mfrow = c(1, 2))
plot(hc, labels = FALSE, hang = -1, main = "Original Tree")
plot(hc1, labels = FALSE, hang = -1, main = "Re-start from 10 clusters")
par(opar)
## other combinaison are possible
hc <- hcluster(USArrests,method = "euc",link = "ward", nbproc= 1,
doubleprecision = TRUE)
hc <- hcluster(USArrests,method = "max",link = "single", nbproc= 2,
doubleprecision = TRUE)
hc <- hcluster(USArrests,method = "man",link = "complete", nbproc= 1,
doubleprecision = TRUE)
hc <- hcluster(USArrests,method = "can",link = "average", nbproc= 2,
doubleprecision = TRUE)
hc <- hcluster(USArrests,method = "bin",link = "mcquitty", nbproc= 1,
doubleprecision = FALSE)
hc <- hcluster(USArrests,method = "pea",link = "median", nbproc= 2,
doubleprecision = FALSE)
hc <- hcluster(USArrests,method = "abspea",link = "median", nbproc= 2,
doubleprecision = FALSE)
hc <- hcluster(USArrests,method = "cor",link = "centroid", nbproc= 1,
doubleprecision = FALSE)
hc <- hcluster(USArrests,method = "abscor",link = "centroid", nbproc= 1,
doubleprecision = FALSE)
hc <- hcluster(USArrests,method = "spe",link = "complete", nbproc= 2,
doubleprecision = FALSE)
hc <- hcluster(USArrests,method = "ken",link = "complete", nbproc= 2,
doubleprecision = FALSE)
Run the code above in your browser using DataLab