These functions transform the input vectors to binary or presence/absence format, then compute a distance or dissimilarity.
jaccard(x, y)sorenson(x, y)
kulczynski_first(x, y)
kulczynski_second(x, y)
rogers_tanimoto(x, y)
russel_rao(x, y)
sokal_michener(x, y)
sokal_sneath(x, y)
yule_dissimilarity(x, y)
Numeric vectors
The dissimilarity between x
and y
, based on
presence/absence. The Jaccard, Sorenson, Sokal-Sneath, Yule, and both
Kulczynski dissimilarities are not defined if both x
and y
have no nonzero elements. In addition, the second Kulczynski index and the
Yule index of dissimilarity are not defined if one of the vectors has no
nonzero elements. We return NaN
for undefined values.
Many of these indices are covered in Koleff et al. (2003), so we adopt their
notation. For two vectors x
and y
, we define three quantities:
\(a\) is the number of species that are present in both x
and y
,
\(b\) is the number of species that are present in y
but
not x
,
\(c\) is the number of species that are present in x
but
not y
, and
\(d\) is the number of species absent in both vectors.
The quantity \(d\) is seldom used in ecology, for good reason. For details, please see the discussion on the "double zero problem," in section 2 of chapter 7.2 in Legendre & Legendre.
The Jaccard index of dissimilarity is \(1 - a / (a + b + c)\), or
one minus the proportion of shared species, counting over both samples
together. Relation of jaccard()
to other definitions:
Equivalent to R's built-in dist()
function with
method = "binary"
.
Equivalent to vegdist()
with method = "jaccard"
and binary = TRUE
.
Equivalent to the jaccard()
function in
scipy.spatial.distance
, except that we always convert vectors to
presence/absence.
Equivalent to \(1 - S_7\) in Legendre & Legendre.
Equivalent to \(1 - \beta_j\), as well as \(\beta_{cc}\), and \(\beta_g\) in Koleff (2003).
The S<U+00F8>renson or Dice index of dissimilarity is
\(1 - 2a / (2a + b + c)\), or one minus the average proportion of shared
species, counting over each sample individually. Relation of
sorenson()
to other definitions:
Equivalent to the dice()
function in
scipy.spatial.distance
, except that we always convert vectors to
presence/absence.
Equivalent to the sorclass
calculator in Mothur, and to
1 - whittaker
.
Equivalent to \(D_{13} = 1 - S_8\) in Legendre & Legendre.
Equivalent to \(1 - \beta_{sor}\) in Koleff (2003). Also equivalent to Whittaker's beta diversity (the second definition, \(\beta_w = (S / \bar{a}) - 1\)), as well as \(\beta_{-1}\), \(\beta_t\), \(\beta_{me}\), and \(\beta_{hk}\).
I have not been able to track down the original reference for the first and second Kulczynski indices, but we have good formulas from Legendre & Legendre. The first Kulczynski index is \(1 - a / (b + c)\), or one minus the ratio of shared to unshared species.
Relation of kulczynski_first
to other definitions:
Equivalent to \(1 - S_{12}\) in Legendre & Legendre.
Equivalent to the kulczynski
calculator in Mothur.
Some people refer to the second Kulczynski index as the
Kulczynski-Cody index. It is defined as one minus the average proportion of
shared species in each vector,
$$
d = 1 - \frac{1}{2} \left ( \frac{a}{a + b} + \frac{a}{a + c} \right ).
$$
Relation of kulczynski_second
to other definitions:
Equivalent to \(1 - S_{13}\) in Legendre & Legendre.
Equivalent to the kulczynskicody
calculator in Mothur.
Equivalent to one minus the Kulczynski similarity in Hayek (1994).
Equivalent to vegdist()
with method = "kulczynski"
and
binary = TRUE
.
The Rogers-Tanimoto distance is defined as
\((2b + 2c) / (a + 2b + 2c + d)\). Relation of rogers_tanimoto()
to other definitions:
Equivalent to the rogerstanimoto()
function in
scipy.spatial.distance
, except that we always convert vectors to
presence/absence.
Equivalent to \(1 - S_2\) in Legendre & Legendre.
The Russel-Rao distance is defined
\((b + c + d) / (a + b + c + d)\), or the fraction of elements not present
in both vectors, counting double absences. Relation of russel_rao()
to
other definitions:
Equivalent to the russelrao()
function in
scipy.spatial.distance
, except that we always convert vectors to
presence/absence.
Equivalent to \(1 - S_{11}\) in Legendre & Legendre.
The Sokal-Michener distance is defined as
\((2b + 2c) / (a + 2b + 2c + d)\). Relation of sokal_michener()
to
other definitions:
Equivalent to the sokalmichener()
function in
scipy.spatial.distance
, except that we always convert vectors to
presence/absence.
The Sokal-Sneath distance is defined as
\((2b + 2c) / (a + 2b + 2c)\). Relation of sokal_sneath()
to other
definitions:
Equivalent to the sokalsneath()
function in
scipy.spatial.distance
, except that we always convert vectors to
presence/absence.
Equivalent to the anderberg
calculator in Mothur.
Equivalent to \(1 - S_{10}\) in Legendre & Legendre.
The Yule dissimilarity is defined as \(2bc / (ad + bc)\). Relation
of yule_dissimilarity()
to other definitions:
Equivalent to the yule()
function in
scipy.spatial.distance
, except that we always convert vectors to
presence/absence.
Equivalent to \(1 - S\), where \(S\) is the Yule coefficient in Legendre & Legendre.