Learn R Programming

cogena (version 1.6.2)

sota: Self-organizing tree algorithm (SOTA)

Description

Computes a Self-organizing Tree Algorithm (SOTA) clustering of a dataset returning a SOTA object. This function comes from sota in the clValid package with litter change.

Usage

sota(data, maxCycles, maxEpochs = 1000, distance = "euclidean", wcell = 0.01, pcell = 0.005, scell = 0.001, delta = 1e-04, neighb.level = 0, maxDiversity = 0.9, unrest.growth = TRUE, ...)
"print"(x, ...)
"plot"(x, cl = 0, ...)

Arguments

data
data matrix or data frame. Cannot have a profile ID as the first column.
maxCycles
integer value representing the maximum number of iterations allowed. The resulting number of clusters returned by sota is maxCycles+1 unless unrest.growth is set to FALSE and the maxDiversity criteria is satisfied prior to reaching the maximum number of iterations
maxEpochs
integer value indicating the maximum number of training epochs allowed per cycle. By default, maxEpochs is set to 1000.
distance
character string used to represent the metric to be used for calculating dissimilarities between profiles. 'euclidean' is the default, with 'correlation' being another option.
wcell
alue specifying the winning cell migration weight. The default is 0.01.
pcell
value specifying the parent cell migration weight. The default is 0.005.
scell
value specifying the sister cell migration weight. The default is 0.001.
delta
value specifying the minimum epoch error improvement. This value is used as a threshold for signaling the start of a new cycle. It is set to 1e-04 by default.
neighb.level
integer value used to indicate which cells are candidates to accept new profiles. This number specifies the number of levels up the tree the algorithm moves in the search of candidate cells for the redistribution of profiles. The default is 0.
maxDiversity
value representing a maximum variability allowed within a cluster. 0.9 is the default value.
unrest.growth
logical flag: if TRUE then the algorithm will run maxCycles iterations regardless of whether the maxDiversity criteria is satisfied or not and maxCycles+1 clusters will be produced; if FALSE then the algorithm can potentially stop before reaching the maxCycles based on the current state of cluster diversities. A smaller than usual number of clusters will be obtained. The default value is TRUE.
...
Any other arguments.
x
an object of sota
cl
cl specifies which cluster is to be plotted by setting it to the cluster ID. By default, cl is equal to 0 and the function plots all clusters side by side.

Value

A SOTA object.
data
data matrix used for clustering
c.tree
complete tree in a matrix format. Node ID, its Ancestor, and whether it's a terminal node (cell) are listed in the first three columns. Node profiles are shown in the remaining columns.
tree
incomplete tree in a matrix format listing only the terminal nodes (cells). Node ID, its Ancestor, and 1's for a cell indicator are listed in the first three columns. Node profiles are shown in the remaining columns.
clust
integer vector whose length is equal to the number of profiles in a data matrix indicating the cluster assingments for each profile in the original order.
totals
integer vector specifying the cluster sizes.
dist
character string indicating a distance function used in the clustering process.
diversity
vector specifying final cluster diverisities.

Details

The Self-Organizing Tree Algorithm (SOTA) is an unsupervised neural network with a binary tree topology. It combines the advantages of both hierarchical clustering and Self-Organizing Maps (SOM). The algorithm picks a node with the largest Diversity and splits it into two nodes, called Cells. This process can be stopped at any level, assuring a fixed number of hard clusters. This behavior is achieved with setting the unrest.growth parameter to TRUE. Growth of the tree can be stopped based on other criteria, like the allowed maximum Diversity within the cluster and so on. Further details regarding the inner workings of the algorithm can be found in the paper listed in the Reference section.

Please note the 'euclidean' is the default distance metric different from sota

References

Herrero, J., Valencia, A, and Dopazo, J. (2005). A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics, 17, 126-136.

Examples

Run this code
#please ref the manual of sota function from clValid.
data(Psoriasis)

sotaCl <- sota(as.matrix(DEexprs), 4)

Run the code above in your browser using DataLab