Construct a self-organizing map and return an object of class 'map
map(data,labels=NULL,xdim=10,ydim=5,
alpha=0.3,train=1000,normalize=TRUE,
seed=NULL)
A dataframe where each row contains an unlabeled training instance.
A vector or dataframe with one label for each observation in data.
The x-dimension of the map.
The y-dimension of the map.
The learning rate, should be a positive non-zero real number.
The number of training iterations.
Boolean switch indicating whether or not to normalize the data.
A seed value for repeatablity of random initialization and selection.
An object of type 'map'. The object has the following member fields:
Data frame contining the possibly normalized training data.
Vector of labels, one for each observation in data or NULL if no labels were given.
The x dimension of the neuron map.
The y dimension of the neuron map.
The given learning rate for the neural network.
The training iterations applied to the neural network.
A list of neurons for the network. The dimensionality of this data frame is the same as the training data. The following two formulas come in handy when working with the neural data. The first set of computations provide the x and y coordinate on the map of the neuron in row 'rowix' of the 'neurons' data frame,
x <- (rowix-1)%%map$xdim+1 y <- (rowix-1)%/%map$xdim+1The second formula computes the row of the neuron in position (x,y) on the map,
rowix <- x+(y-1)*map$xdim
This is the representation of the map which is the basis for the 'starburst' plot.
List of indexes of the best matching neuron for each observation. Each index is an row index into the 'neuron' data frame.
This is a data frame of (x,y)-locations where each cell points to the the (x,y)-location on the map where the corresponding centroid is located. Centroids point to themselves.
A vector of actual centroid (x,y)-locations on the map. Hint: to compute the number of clusters on the map take the length of this vector.
A data frame where the (x,y)-locations of actual centroids have a label associated with them. All other locations are NULL. If the training data is unlabeled then popsom invents a label for each centroid.
A label-to-centroid lookup table (hash). A lookup in this table will return a list of indexes into the 'unique.centroids' table. Note: a label can be associated with multiple centroids.
A vector of lists of observations per centroid indexed by the centroid number from 'unique.centroids'. The observations on the list are row numbers of the 'data' data frame.
A quality measure of how well the map fits the training data.
The average 'within cluster sum of squares'. This is the average distance variance within the clusters of the underlying cluster model.
The 'between cluster sum of squares'. This is the distance variance between the cluster centroids of the underlying cluster model.
The function 'map' constructs an object of type 'map'. The object contains two models: (1) A self-organizing map model expressed through its trained neurons and its quality of fit can be ascertained by the 'convergence' (see below). (2) A cluster model expressed by the discovered centroids. The quality of these models can be ascertained by the 'map convergence', 'within cluster sum of squares', and the 'between cluster sum of squares' (see below).
VSOM: Efficient, Stochastic Self-Organizing Map Training, Lutz Hamel, Intelligent Systems Conference (IntelliSys) 2018, K. Arai et al. (Eds.): Intelligent Systems and Applications, Advances in Intelligent Systems and Computing 869, pp 805-821, Springer, 2018.
Self-Organizing Map Convergence, Robert Tatoian and Lutz Hamel. Proceedings of the 2016 International Conference on Data Mining (DMIN'16), pp92-98, July 25-28, 2016, Las Vegas, Nevada, USA, ISBN: 1-60132-431-6, CSREA Press.
Evaluating Self-Organizing Map Quality Measures as Convergence Criteria, Gregory Breard and Lutz Hamel, Proceedings of the 2018 International Conference on Data Science (ICDATA'18), Robert Stahlbock, Gary M. Weiss, Mahmoud Abou-Nasr (Eds.), ISBN: 1-60132-481-2, pp 86-92, CSREA Press, 2018.
SOM Quality Measures: An Efficient Statistical Approach, Lutz Hamel, Proceedings of the 11th International Workshop WSOM 2016, Houston, Texas USA, E. Merenyi et al. (eds.), Advances in Self-Organizing Maps and Learning Vector Quantization, Advances in Intelligent Systems and Computing 428, Springer, pp 49-59, DOI 10.1007/978-3-319-28518-4_4, 2016.
# NOT RUN {
# training data
data(iris)
df <- subset(iris,select=-Species)
labels <- subset(iris,select=Species)
# build a map
m <- map(df,labels,xdim=15,ydim=10,train=10000,seed=42)
# look at the characteristics of the maps
summary(m)
# plot the map
starburst(m)
# }
Run the code above in your browser using DataLab