The binaryDistance
function defines various similarity or distance
measures between binary vectors, which represent the first step in the
algorithm underlying the Mercator
visualizations.
binaryDistance(X, metric)
Returns an object of class dist
corresponding to the distance
metric
provided.
An object of class matrix
.
An object of class character
limited to the names of
10 selected distance metrics: jaccard
, sokalMichener
, hamming
,
russellRao
, pearson
, goodmanKruskal
, manhattan
,
canberra
, binary
, or euclid
.
Kevin R. Coombes <krc@silicovore.com>, Caitlin E. Coombes
Similarity or difference between binary vectors can be calculated using a variety of distance measures. In the main reference (below), Choi and colleagues reviewed 76 different measures of similarity of distance between binary vectors. They also produced a hierarchical clustering of these measures, based on the correlation between their distance values on multiple simulated data sets. For metrics that are highly similar, we chose a single representative.
Cluster 1, represented by the jaccard
distance, contains Dice & Sorenson, Ochiai,
Kulcyznski, Bray & Curtis, Baroni-Urbani & Buser, and Jaccard.
Cluster 2, represented by the sokalMichener
distance, contains Sokal & Sneath,
Gilbert & Wells, Gower & Legendre, Pearson & Heron, Hamming, and Sokal & Michener.
Also within this cluster are 4 distances represented independently within this function:
hamming
, manhattan
, canberra
, and euclidean
distances
Cluster 3, represented by the russellRao
distance, contains Driver & Kroeber,
Forbes, Fossum, and Russell & Rao.
The remaining metrics are more isolated, without strong clustering. We considered a few
examples, including the Pearson distance (pearson
) and the Goodman & Kruskal distance
(goodmanKruskal
). The binary
distance is also included.
Choi SS, Cha SH, Tappert CC, A Survey of Binary Similarity and Distance Measures. Systemics, Cybernetics, and Informatics. 2010; 8(1):43-48.
This set includes all of the metrics from the dist
function.
my.matrix <- matrix(rbinom(50*100, 1, 0.15), ncol=50)
my.dist <- binaryDistance(my.matrix, "jaccard")
Run the code above in your browser using DataLab