hopach(data, dmat = NULL, d = "cosangle", clusters = "best", K = 15, 
kmax = 9, khigh = 9, coll = "seq", newmed = "medsil", mss = "med", 
impr = 0, initord = "co", ord = "own", verbose=FALSE)hdist object of pair wise distances between all genes (arrays). All values 
	must be numeric, and missing values are not allowed. If NULL, this matrix is computed 
	using the metric specified by d. If a matrix is provided, the user is
	responsible for ensuring that the metric used agrees with d.distancematrix() and distancevector().impr, then the collapse is not performed.TRUE then verbose output is printed.data that are the 'k'
	cluster medoids, i.e. profiles (or centroids) for each cluster.'sizes' is a vector containing the 'k' cluster sizes.'labels' is a vector containing the main cluster labels for every variable. Each
	label consists of one digit per level of the tree (up to the level identified as
	the main clusters). The digit (1-9) indicates which child cluster the variable
	was in at that level. For example, '124' means the fist (leftmost in the tree)
	cluster in level 1, the second child of cluster '1' in level 2, and the fourth 
	child of cluster '12' in level 3. These can be mapped to the numbers 1:k for
	simplicity, though the tree structure and relationship amongst the clusters is
	then lost, e.g. 1211 is closer to 1212 than to 1221.'order' is a vector containing the ordering of variables within the main clusters.
	The clusters are ordered deterministically as the tree is built. The elements within
	each of the main clusters are ordered with the method determined by the value of
	ord: "own" (relative to own medoid), "neighbor" (relative to next medoid
	to the right), or "co" (maximize correlation ordering).
  ord: "own" (relative to own medoid), "neighbor" (relative 
	to next medoid to the right), or "co" (maximize correlation ordering).'medoids' is a matrix containing the labels and corresponding medoids for each
	internal node and leaf of the tree. The number of digits in the label indicates
	the level for that node. The medoid refers to a row of data
  The Median (or Mean) Split Silhouette (MSS) criteria is used by HOPACH to (i) determine the optimal number of children at each node, (ii) decide which pairs of clusters to collapse at each level, and (iii) identify the first level of the tree with maximally homogeneous clusters. In each case, the goal is to minimize MSS, which is a measure of cluster heterogeneity described in http://www.bepress.com/ucbbiostat/paper107/.
In hopach versions <2.0.0, these="" functions="" returned="" the="" square="" root="" of="" usual="" distance="" for="" d="cosangle", d="abscosangle", 
d="cor", and d="abscor". Typically, this transformation makes
the dissimilarity correspond more closely with the norm. In order to 
agree with the dist function, the square root is no longer used 
in versions >=2.0.0. See ? distancematrix(). 2.0.0,>
van der Laan, M.J. and Pollard, K.S. A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. Journal of Statistical Planning and Inference, 2003, 117, pp. 275-303.
http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Papers/hopach.pdf
http://www.bepress.com/ucbbiostat/paper107/
http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Papers/jsmpaper.pdf
Kaufman, L. and Rousseeuw, P.J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.
distancematrix, labelstomss, boothopach, pam, makeoutput
#25 variables from two groups with 3 observations per variable
mydata<-rbind(cbind(rnorm(10,0,0.5),rnorm(10,0,0.5),rnorm(10,0,0.5)),cbind(rnorm(15,5,0.5),rnorm(15,5,0.5),rnorm(15,5,0.5)))
dimnames(mydata)<-list(paste("Var",1:25,sep=""),paste("Exp",1:3,sep=""))
mydist<-distancematrix(mydata,d="cosangle") #compute the distance matrix.
#clusters and final tree
clustresult<-hopach(mydata,dmat=mydist)
clustresult$clustering$k #number of clusters.
dimnames(mydata)[[1]][clustresult$clustering$medoids] #medoids of clusters.
table(clustresult$clustering$labels) #equal to clustresult$clustering$sizes.
#faster, sometimes fewer clusters
greedyresult<-hopach(mydata,clusters="greedy",dmat=mydist) 
#only get the final ordering (no partitioning into clusters)
orderonly<-hopach(mydata,clusters="none",dmat=mydist)
#cluster the columns (rather than rows)
colresult<-hopach(t(mydata),dmat=distancematrix(t(mydata),d="euclid"))
Run the code above in your browser using DataLab