xcluster: Hierarchical clustering

Description

Performs a hierarchical cluster analysis on a set of dissimilarities (this function launch an external program: Xcluster).

Usage

xcluster(data,distance="euclidean",clean=FALSE,tmp.in="tmp.txt",tmp.out="tmp.gtr")

Arguments

Details

Available distance measures are (written for two vectors $x$ and $y$):

Euclidean: Usual square distance between the two vectors (2 norm).
Pearson:$1- \mbox{cor}(x,y)$
Pearson not centered:$1 - \frac{\sum_i x_i y_i}{\left(\sum_i x_i^2 \sum_i y_i^2 \right) ^{1/2}}$

Xcluster does not use usual agglomerative methods (single, average, complete), but compute the distance between each groups' barycenter for the distance between two groups.

This have a problem for this kind of data:

rll{ A 0 0 B 0 1 C 0.9 0.5 }

Ie: a triangular in {R$^2$, the distance between A and B is larger than the distance between the group A,B and C (with euclidean distance).

For that case it can be useful to use clean=TRUE and that mean that you must not consider A and B as a group without C. }

An object of class hclust which describes the tree produced by the clustering process. The object is a list with components:

merge{an $n-1$ by 2 matrix. Row $i$ of merge describes the merging of clusters at step $i$ of the clustering. If an element $j$ in the row is negative, then observation $-j$ was merged at this stage. If $j$ is positive then the merge was with the cluster formed at the (earlier) stage $j$ of the algorithm. Thus negative entries in merge indicate agglomerations of singletons, and positive entries indicate agglomerations of non-singletons.}

height{a set of $n-1$ non-decreasing real values. The clustering height: that is, the value of the criterion associated with the clustering method for the particular agglomeration.}

order{a vector giving the permutation of the original observations suitable for plotting, in the sense that a cluster plot using this ordering and matrix merge will not have crossings of the branches.}

labels{labels for each of the objects being clustered.}

call{the call which produced the result.}

method{the cluster method that has been used.}

dist.method{the distance that has been used to create d (only returned if the distance object has a "method" attribute).}

Xcluster is a C program made by Gavin Sherlock that performs hierarchical clustering, K-means and SOM.

Xcluster is copyrighted. To get or have information about Xcluster: http://genome-www.stanford.edu/~sherlock/cluster.html

# Create data .Random.seed <- c(1, 416884367 ,1051235439) m <- matrix(rep(1,3*24),ncol=3) m[9:16,3] <- 3 ; m[17:24,] <- 3 #create 3 groups m <- m+rnorm(24*3,0,0.5) #add noise m <- floor(10*m)/10 #just one digits

# And once you have Xcluster program: # #h <- xcluster(m) # #plot(h) cluster

[object Object],[object Object] r2xcluster, xcluster2r,hclust, hcluster