vm: Variable Mutability measure

Description

The Variable Mutability similarity measure was introduced in (Sulc and Rezankova, 2015). It treats similarity between two categories according to within-cluster variability expressed by the Gini coefficient (mutability). The novel similarity measures praise more the match of two categories in a variable with high variability, because it is rarer, than the match in a low-variability variable. Hierarchical clustering methods require a proximity (dissimilarity) matrix instead of a similarity matrix as an entry for the analysis; therefore, dissimilarity D is computed from similarity S according the equation 1/S-1.

Usage

vm(data)

Arguments

data

data frame or matrix with cases in rows and variables in colums. Cases are characterized by nominal (categorical) variables coded as numbers.

Value

Function returns a matrix of the size n x n, where n is the number of objects in original data. The matrix contains proximities between all pairs of objects. It can be used in hierarchical cluster analyses (HCA), e.g. in agnes.

References

Sulc, Z. and Rezankova H. (2015). Novel similarity measures for categorical data based on mutability and entropy. Conference of the International Federation of Classification Societies. Bologna: Ospitalia, p. 209.

Examples

Run this code

# NOT RUN {
#sample data
data(data20)
# Creation of proximity matrix
prox_vm <- vm(data20)

# }

Run the code above in your browser using DataLab

Get 50% off unlimited learning