Learn R Programming

dbrobust (version 1.0.0)

dist_binary: Compute pairwise binary distances

Description

Internal helper function to compute pairwise distances between binary vectors using standard binary distance/similarity measures. Delegates to ade4::dist.binary when available for performance.

Usage

dist_binary(x, method)

Value

A symmetric numeric matrix of pairwise distances. NA is returned for pairs with no valid comparisons (all NA entries).

Arguments

x

A numeric matrix or data frame of binary values (0/1, TRUE/FALSE, or NA)

method

A character string specifying the binary distance measure to use.

Details

Supported methods (for two binary vectors \(x_i\) and \(x_j\)):

  • "jaccard": $$d = 1 - \frac{a}{a + b + c}$$

  • "dice": $$d = 1 - \frac{2a}{2a + b + c}$$

  • "sokal_michener": $$d = 1 - \frac{a + d}{a + b + c + d}$$

  • "russell_rao": $$d = 1 - \frac{a}{a + b + c + d}$$

  • "sokal_sneath": $$d = 1 - \frac{a}{a + 1/2(b + c)}$$

  • "kulczynski": $$d = 1 - \frac{1}{2}\left(\frac{a}{a+b} + \frac{a}{a+c}\right)$$

  • "hamming": $$d = 1 - \frac{a + d}{a + b + c + d}$$

Where:

  • \(a\) = number of positions where both vectors are 1

  • \(b\) = number of positions where x_i = 1 and x_j = 0

  • \(c\) = number of positions where x_i = 0 and x_j = 1

  • \(d\) = number of positions where both vectors are 0

The Sokal-Michener coefficient is equivalent to the normalized Hamming distance.

  • Factors or character columns are converted to numeric 0/1.

  • Missing values (NA) are ignored pairwise; if all entries are missing, distance is NA.

  • Methods supported by ade4 (e.g., Jaccard, Dice, Sokal-Michener, etc.) are computed via ade4::dist.binary for efficiency.

  • Manual computations are implemented for Hamming and Kulczynski if ade4 is unavailable.

Examples

Run this code
# Small example with binary matrix
mat <- matrix(c(
  1, 0, 1,
  1, 1, 0,
  0, 1, 1
), nrow = 3, byrow = TRUE)

# Example with Jaccard
dbrobust::dist_binary(mat, method = "jaccard")

# Example with Hamming
dbrobust::dist_binary(mat, method = "hamming")

Run the code above in your browser using DataLab