Learn R Programming

TDA (version 1.0)

clusterTree: Density clustering: the cluster tree

Description

Given a point cloud, or a matrix of distances, this function computes a density estimates and returns an object of class clusterTree (lambda tree and kappa tree; see references).

Usage

clusterTree(X, k, h = NULL, density="knn", dist="euclidean", d=NULL, 
            Nlambda = 100, printStatus = FALSE)

Arguments

X
If dist="euclidean" then X is an $n$ by $d$ matrix of coordinates, where $n$ is the number of points stored in X and $d$ is the dimension of the space. If dist="arbitrary" then X is an $
k
an integer value specifying the parameter of the underlying k-nearest neighbor similarity graph, used to determine connected components. If density="knn", then k is also used to compute the k-nearest neighbor density estimator.
h
real value: if density="kde", then h is used to compute the kernel density estimator with bandwidth h. Default is NULL.
density
string: if "knn" then the k-nearest neighbor density estimator is used to compute the cluster tree; if "kde" then the kernel density estimator is used to compute the cluster tree. Default is "knn".
dist
string: can be "euclidean", when X is a point cloud or "arbitrary", when X is a matrix of distances.
d
integer: if dist="arbitrary", then d is the dimension of the underlying space.
Nlambda
integer: size of the grid of values of the density estimator, used to compute the cluster tree. High Nlambda (i.e. a fine grid) means a more accurate cluster Tree.
printStatus
logical: if TRUE a progress bar is printed. Default is FALSE.

Value

  • This function returns an object of class clusterTree, a list with the following components
  • nThe number of points stored in the input matrix X
  • idVector: the IDs associated to the branches of the cluster tree
  • XbaseVector: the horiontal coordinates of the branches of the cluster tree, in the same order of id
  • YbottomVector: the vertical bottom coordinates of the branches of the lambda tree, in the same order of id
  • YtopVector: the vertical top coordinates of the branches of the lambda tree, in the same order of id
  • KbottomVector: the vertical bottom coordinates of the branches of the kappa tree, in the same order of id
  • KtopVector: the vertical top coordinates of the branches of the kappa tree, in the same order of id
  • siloA list whose elements are the horizontal coordinates of the silo of each branch, in the same order of id
  • sonsA list whose elements are the IDs of the sons of each branch, in the same order of id
  • parentVector: the IDs of the parents of each branch, in the same order of id
  • DataPointsA list whose elements are the points of X corresponding to each branch, in the same order of id
  • densityVector of length n: the values of the density estimator evaluated at each of the points stored in X

Details

This function is an implementation of Algorithm 1 in the first reference.

References

Brian P. Kent, Alessandro Rinaldo, and Timothy Verstynen, (2013), "DeBaCl: A Python Package for Interactive DEnsity-BAsed CLustering."arXiv:1307.8136

Fabrizio Lecci, Alessandro Rinaldo, and Larry Wasserman, (2014), "Metric Embeddings for Cluster Trees"

See Also

plot.clusterTree

Examples

Run this code
## Generate data: 3 clusters
n=1200    #sample size
Neach=floor(n/4) 
X1=cbind(rnorm(Neach,1,.8),rnorm(Neach,5,0.8))
X2=cbind(rnorm(Neach,3.5,.8),rnorm(Neach,5,0.8))
X3=cbind(rnorm(Neach,6,1),rnorm(Neach,1,1))
X=rbind(X1,X2,X3)

k=100     #parameter of knn

## Density clustering using knn and kde
Tree=clusterTree(X,k, density="knn")
TreeKDE=clusterTree(X,k,h=0.3, density="kde")

par(mfrow=c(2,3))
plot(X, pch=19, cex=0.6)
# plot lambda trees
plot(Tree, type="lambda", main="lambda Tree (knn)")
plot(TreeKDE, type="lambda", main="lambda Tree (kde)")
# plot clusters
plot(X, pch=19, cex=0.6, main="cluster labels")
for (i in Tree$id){
	points(matrix(X[Tree$DataPoints[[i]],],ncol=2), col=i, pch=19, cex=0.6)
}
#plot kappa trees
plot(Tree, type="kappa", main="kappa Tree (knn)")
plot(TreeKDE, type="kappa", main="kappa Tree (kde)")

Run the code above in your browser using DataLab