superClass: Create super-clusters from SOM clustering

Description

Aggregate the resulting clustering of the SOM algorithm into super-clusters.

Usage

# S3 method for somRes
superClass(sommap, method="ward.D", members=NULL, k=NULL, 
h=NULL, ...)
# S3 method for somSC
print(x, ...)
# S3 method for somSC
summary(object, ...)
# S3 method for somSC
projectIGraph(object, init.graph, ...)
# S3 method for somSC
plot(x, type=c("dendrogram", "grid", "hitmap", "lines", 
                                 "barplot", "boxplot", "mds", "color", 
                                 "poly.dist", "pie", "graph", "dendro3d", 
                                 "radar", "projgraph"),
                       plot.var=TRUE, plot.legend=FALSE, add.type=FALSE, 
                       print.title = FALSE, 
                       the.titles = paste("Cluster", 
                                          1:prod(x$som$parameters$the.grid$dim)),
                       ...)

Arguments

sommap

A somRes object

method, members

Arguments passed to the hclust function.

k, h

Arguments passed to the cutree function (respectively, the number of super-clusters or the height where to cut the dendrogram).

x, object

A somSC object

init.graph

An igraph object which is projected according to the super-clusters. The number of vertices of init.graph must be equal to the number of rows in the original dataset processed by the SOM (case "korresp" is not handled by this function). In the projected graph, the vertices are positionned at the center of gravity of the super-clusters (more details in the section Details below).

type

The type of plot to draw. Default value is "dendrogram", to plot the dendrogram of the clustering. Case "grid" plots the grid in color according to the super clustering. Case "projgraph" uses an igraph object passed to the argument variable and plots the projected graph as defined by the function projectIGraph.somSC. All other cases are those available in the function plot.somRes and surimpose the super-clusters over these plots.

plot.var

A boolean indicating whether a graph showing the evolution of the explained variance should be plotted. This argument is only used when type="dendrogram", its default value is TRUE.

plot.legend

A boolean indicating whether a legend should be added to the plot. This argument is only used when type is either "grid" or "hitmap" or "mds". Its default value is FALSE.

add.type

A boolean, which default value is FALSE, indicating whether you are giving an additional variable to the argument variable or not. If you do, the function plot.somRes will be called with the argument what set to "add".

print.title

Whether the cluster titles must be printed in center of the grid or not for type="grid". Default to FALSE (titles not displayed).

the.titles

If print.title = TRUE, values of the title to display for type="grid". Default to "Cluster " followed by the cluster number.

...

Used for plot.somSC: further arguments passed either to the function plot (case type="dendro") or to plot.myGrid (case type="grid") or to plot.somRes (all other cases).

Value

The superClass function returns an object of class somSC which is a list of the following elements:

cluster

The super clustering of the prototypes (only if either k or h are given by user).

tree

An hclust object.

som

The somRes object given as argument (see trainSOM for details).

The projectIGraph.somSC function returns an object of class igraph with the following attributes: the graph attribute layout which provides the layout of the projected graph according to the center of gravity of the super-clusters positionned on the SOM grid; the vertex attributes name and size which, respectively are the vertex number on the grid and the number of vertexes included in the corresponding cluster; the edge attribute weight which gives the number of edges (or the sum of the weights) between the vertexes of the two corresponding clusters.

Details

The superClass function can be used in 2 ways:

to choose the number of super clusters via an hclust object: then, both arguments k and h are not filled.
to cut the clustering into super clusters: then, either argument k or argument h must be filled. See cutree for details on these arguments.

The squared distance between prototypes is passed to the algorithm.

summary on a superClass object produces a complete summary of the results that displays the number of clusters and super-clusters, the clustering itself and performs ANOVA analyses. For type="numeric" the ANOVA is performed for each input variable and test the difference of this variable accross the super-clusters of the map. For type="relational" a dissimilarity ANOVA is performed (see (Anderson, 2001), except that in the present version, a crude estimate of the p-value is used which is based on the Fisher distribution and not on a permutation test.

On plots, the different super classes are identified in the following ways:

either with different color, when type is set among: "grid" (*, #), "hitmap" (*, #), "lines" (*, #), "barplot" (*, #), "boxplot", "mds" (*, #), "dendro3d" (*, #), "graph" (*, #)
or with title, when type is set among: "color" (*), "poly.dist" (*, #), "pie" (#), "radar" (#)

In the list above, the charts available for a korresp SOM are marked with a * whereas those available for a relational SOM are marked with a #.

projectIGraph.somSC produces a projected graph from the igraph object passed to the argument variable as described in (Olteanu and Villa-Vialaneix, 2015). The attributes of this graph are the same than the ones obtained from the SOM map itself in the function projectIGraph.somRes. plot.somSC used with type="projgraph" calculates this graph and represents it by positionning the super-vertexes at the center of gravity of the super-clusters. This feature can be combined with pie.graph=TRUE to super-impose the information from an external factor related to the individuals in the original dataset (or, equivalently, to the vertexes of the graph).

References

Anderson M.J. (2001). A new method for non-parametric multivariate analysis of variance. Austral Ecology, 26, 32-46.

Olteanu M., Villa-Vialaneix N. (2015) Using SOMbrero for clustering and visualizing graphs. Journal de la Societe Francaise de Statistique, 156, 95-119.

Examples

Run this code

# NOT RUN {
set.seed(11051729)
my.som <- trainSOM(x.data=iris[,1:4])
# choose the number of super-clusters
sc <- superClass(my.som)
plot(sc)
# cut the clustering
sc <- superClass(my.som, k=4)
summary(sc)
plot(sc)
plot(sc, type="hitmap", plot.legend=TRUE)
# }

Run the code above in your browser using DataLab