The Variable Mutability similarity measure was introduced in (Sulc and Rezankova, 2015).
It treats similarity between two categories according to within-cluster variability expressed by the Gini coefficient (mutability).
The novel similarity measures praise more the match of two categories in a variable with high variability, because it is rarer,
than the match in a low-variability variable.
Hierarchical clustering methods require a proximity (dissimilarity) matrix instead of a similarity matrix as
an entry for the analysis; therefore, dissimilarity D
is computed from similarity S
according the equation
1/S-1
.
vm(data)
data frame or matrix with cases in rows and variables in colums. Cases are characterized by nominal (categorical) variables coded as numbers.
Function returns a matrix of the size n x n
, where n
is the number of objects in original data. The matrix contains proximities
between all pairs of objects. It can be used in hierarchical cluster analyses (HCA), e.g. in agnes
.
Sulc, Z. and Rezankova H. (2015). Novel similarity measures for categorical data based on mutability and entropy. Conference of the International Federation of Classification Societies. Bologna: Ospitalia, p. 209.
eskin
,
good1
,
good2
,
good3
,
good4
,
iof
,
lin
,
lin1
,
morlini
,
of
,
sm
,
ve
.
# NOT RUN {
#sample data
data(data20)
# Creation of proximity matrix
prox_vm <- vm(data20)
# }
Run the code above in your browser using DataLab