jaccard: Beta diversity for presence/absence data

Description

These functions transform the input vectors to binary or presence/absence format, then compute a distance or dissimilarity.

Usage

jaccard(x, y)
sorenson(x, y)
kulczynski_first(x, y)
kulczynski_second(x, y)
rogers_tanimoto(x, y)
russel_rao(x, y)
sokal_michener(x, y)
sokal_sneath(x, y)
yule_dissimilarity(x, y)

Arguments

x, y

Numeric vectors

Value

The dissimilarity between x and y, based on presence/absence. The Jaccard, Sorenson, Sokal-Sneath, Yule, and both Kulczynski dissimilarities are not defined if both x and y have no nonzero elements. In addition, the second Kulczynski index and the Yule index of dissimilarity are not defined if one of the vectors has no nonzero elements. We return NaN for undefined values.

Details

Many of these indices are covered in Koleff et al. (2003), so we adopt their notation. For two vectors x and y, we define three quantities:

$a$ is the number of species that are present in both x and y,
$b$ is the number of species that are present in y but not x,
$c$ is the number of species that are present in x but not y, and
$d$ is the number of species absent in both vectors.

The quantity $d$ is seldom used in ecology, for good reason. For details, please see the discussion on the "double zero problem," in section 2 of chapter 7.2 in Legendre & Legendre.

The Jaccard index of dissimilarity is $1 - a / (a + b + c)$, or one minus the proportion of shared species, counting over both samples together. Relation of jaccard() to other definitions:

Equivalent to R's built-in dist() function with method = "binary".
Equivalent to vegdist() with method = "jaccard" and binary = TRUE.
Equivalent to the jaccard() function in scipy.spatial.distance, except that we always convert vectors to presence/absence.
Equivalent to $1 - S_7$ in Legendre & Legendre.
Equivalent to $1 - \beta_j$, as well as $\beta_{cc}$, and $\beta_g$ in Koleff (2003).

The S<U+00F8>renson or Dice index of dissimilarity is $1 - 2a / (2a + b + c)$, or one minus the average proportion of shared species, counting over each sample individually. Relation of sorenson() to other definitions:

Equivalent to the dice() function in scipy.spatial.distance, except that we always convert vectors to presence/absence.
Equivalent to the sorclass calculator in Mothur, and to 1 - whittaker.
Equivalent to $D_{13} = 1 - S_8$ in Legendre & Legendre.
Equivalent to $1 - \beta_{sor}$ in Koleff (2003). Also equivalent to Whittaker's beta diversity (the second definition, $\beta_w = (S / \bar{a}) - 1$), as well as $\beta_{-1}$, $\beta_t$, $\beta_{me}$, and $\beta_{hk}$.

I have not been able to track down the original reference for the first and second Kulczynski indices, but we have good formulas from Legendre & Legendre. The first Kulczynski index is $1 - a / (b + c)$, or one minus the ratio of shared to unshared species.

Relation of kulczynski_first to other definitions:

Equivalent to $1 - S_{12}$ in Legendre & Legendre.
Equivalent to the kulczynski calculator in Mothur.

Some people refer to the second Kulczynski index as the Kulczynski-Cody index. It is defined as one minus the average proportion of shared species in each vector, $$ d = 1 - \frac{1}{2} \left ( \frac{a}{a + b} + \frac{a}{a + c} \right ). $$ Relation of kulczynski_second to other definitions:

Equivalent to $1 - S_{13}$ in Legendre & Legendre.
Equivalent to the kulczynskicody calculator in Mothur.
Equivalent to one minus the Kulczynski similarity in Hayek (1994).
Equivalent to vegdist() with method = "kulczynski" and binary = TRUE.

The Rogers-Tanimoto distance is defined as $(2b + 2c) / (a + 2b + 2c + d)$. Relation of rogers_tanimoto() to other definitions:

Equivalent to the rogerstanimoto() function in scipy.spatial.distance, except that we always convert vectors to presence/absence.
Equivalent to $1 - S_2$ in Legendre & Legendre.

The Russel-Rao distance is defined $(b + c + d) / (a + b + c + d)$, or the fraction of elements not present in both vectors, counting double absences. Relation of russel_rao() to other definitions:

Equivalent to the russelrao() function in scipy.spatial.distance, except that we always convert vectors to presence/absence.
Equivalent to $1 - S_{11}$ in Legendre & Legendre.

The Sokal-Michener distance is defined as $(2b + 2c) / (a + 2b + 2c + d)$. Relation of sokal_michener() to other definitions:

Equivalent to the sokalmichener() function in scipy.spatial.distance, except that we always convert vectors to presence/absence.

The Sokal-Sneath distance is defined as $(2b + 2c) / (a + 2b + 2c)$. Relation of sokal_sneath() to other definitions:

Equivalent to the sokalsneath() function in scipy.spatial.distance, except that we always convert vectors to presence/absence.
Equivalent to the anderberg calculator in Mothur.
Equivalent to $1 - S_{10}$ in Legendre & Legendre.

The Yule dissimilarity is defined as $2bc / (ad + bc)$. Relation of yule_dissimilarity() to other definitions:

Equivalent to the yule() function in scipy.spatial.distance, except that we always convert vectors to presence/absence.
Equivalent to $1 - S$, where $S$ is the Yule coefficient in Legendre & Legendre.