Learn R Programming

dbrobust (version 1.0.0)

dist_continuous: Compute pairwise distances for continuous numeric data

Description

Internal helper function to compute pairwise distance matrices for purely numeric datasets. Supports standard metrics, including Euclidean, Manhattan, Chebyshev, Canberra, Minkowski, standardized Euclidean, and Mahalanobis distances.

Usage

dist_continuous(x, method, p = NULL)

Value

A symmetric numeric matrix of pairwise distances between rows of x. The diagonal contains zeros.

Arguments

x

A numeric data frame or matrix with rows as observations and columns as variables.

method

Distance metric to compute (see details for supported options).

p

Numeric, the power parameter for Minkowski distance (required if method = "minkowski").

Details

Supported methods and formulas (for observations \(\mathbf{z}_i\) and \(\mathbf{z}_j\)):

  • "euclidean": $$\delta_E(i,j) = \sqrt{\sum_{k=1}^{p} (z_{ik} - z_{jk})^2}$$

  • "minkowski": $$\delta_q(i,j) = \left( \sum_{k=1}^{p} |z_{ik} - z_{jk}|^q \right)^{1/q}$$ requires p = q

  • "manhattan": $$\delta_1(i,j) = \sum_{k=1}^{p} |z_{ik} - z_{jk}|$$

  • "maximum": $$\delta_\infty(i,j) = \max_k |z_{ik} - z_{jk}|$$

  • "canberra": $$\delta_C(i,j) = \sum_{k=1}^{p} \frac{|z_{ik} - z_{jk}|}{|z_{ik}| + |z_{jk}|}$$ convention: \(0/0 := 0\)

  • "euclidean_standardized": $$\delta_K(i,j) = \sqrt{\sum_{k=1}^{p} \frac{(z_{ik} - z_{jk})^2}{s_k^2}}$$ \(s_k^2\) is the variance of variable k

  • "mahalanobis": $$\delta_M(i,j) = \sqrt{ (\mathbf{z}_i - \mathbf{z}_j)' \mathbf{S}^{-1} (\mathbf{z}_i - \mathbf{z}_j) }$$ \(\mathbf{S}\) is the covariance matrix

Considerations when choosing a distance metric:

  • For "euclidean_standardized", columns are standardized to mean 0 and variance 1 before computing Euclidean distances.

  • Cosine and correlation distances rely on the proxy package; these are not guaranteed to be strictly Euclidean.

  • Minkowski distance requires specifying the parameter p (e.g., p = 3 for L3 norm).

  • Mahalanobis distance uses the inverse of the covariance matrix. If the covariance matrix is singular, the generalized inverse from MASS::ginv is used.

  • Standard metrics (Euclidean, Manhattan, Maximum, Canberra) are computed using stats::dist.

Examples

Run this code
# Small numeric matrix
mat <- matrix(c(1, 2, 3,
                4, 5, 6,
                7, 8, 9), nrow = 3, byrow = TRUE)

# Euclidean distance
dbrobust::dist_continuous(mat, method = "euclidean")

# Standardized Euclidean
dbrobust::dist_continuous(mat, method = "euclidean_standardized")

# Minkowski distance with p = 3
dbrobust::dist_continuous(mat, method = "minkowski", p = 3)

# Mahalanobis distance
set.seed(123)
mat <- matrix(rnorm(5*3), nrow = 5, ncol = 3)
colnames(mat) <- c("X1","X2","X3")
# Compute the mahalanobis distance
dbrobust::dist_continuous(mat, method = "mahalanobis")

# Cosine distance (requires 'proxy' package)
dbrobust::dist_continuous(mat, method = "cosine")

Run the code above in your browser using DataLab