Learn R Programming

dbrobust (version 1.0.0)

robust_distances: Compute Robust Squared Distances for Mixed Data

Description

Computes a weighted, robust squared distance matrix for datasets containing continuous, binary, and categorical variables. Continuous variables are handled via a robust Mahalanobis distance, and binary and categorical variables are transformed via similarity coefficients. The output is suitable for Euclidean correction with make_euclidean.

Usage

robust_distances(
  data = NULL,
  cont_vars = NULL,
  bin_vars = NULL,
  cat_vars = NULL,
  w = NULL,
  p = NULL,
  method = c("ggower", "relms"),
  robust_cov = NULL,
  alpha = 0.1,
  return_dist = FALSE
)

Value

A numeric matrix of squared robust distances (n x n) or a dist object if return_dist = TRUE.

Arguments

data

Data frame or numeric matrix containing the observations.

cont_vars

Character vector of column names for continuous variables.

bin_vars

Character vector of column names for binary variables.

cat_vars

Character vector of column names for categorical variables.

w

Numeric vector of observation weights. If NULL, uniform weights are used.

p

Integer vector of length 3: c(#cont, #binary, #categorical). Overrides variable type selection if provided.

method

Character string: either "ggower" or "relms" for distance computation.

robust_cov

Optional. Precomputed robust covariance matrix for continuous variables. If NULL, it will be estimated internally using the specified trimming proportion alpha.

alpha

Numeric trimming proportion for robust covariance of continuous variables.

return_dist

Logical. If TRUE, returns an object of class dist; otherwise, returns a squared distance matrix.

Examples

Run this code
# Example: Robust Squared Distances for Mixed Data

# Load example data and subset
data("Data_HC_contamination", package = "dbrobust")
Data_small <- Data_HC_contamination[1:50, ]

# Define variable types
cont_vars <- c("V1", "V2", "V3", "V4")  # continuous
cat_vars  <- c("V5", "V6", "V7")        # categorical
bin_vars  <- c("V8", "V9")              # binary

# Use column w_loop as weights
w <- Data_small$w_loop

# -------------------------------
# Method 1: Gower distances
# -------------------------------
dist_sq_ggower <- robust_distances(
  data = Data_small,
  cont_vars = cont_vars,
  bin_vars  = bin_vars,
  cat_vars  = cat_vars,
  w = w,
  alpha = 0.10,
  method = "ggower"
)

# Apply Euclidean correction if needed
res_ggower <- make_euclidean(dist_sq_ggower, w)

# Show first 5x5 block of original and corrected distances
cat("GGower original squared distances (5x5 block):\n")
print(round(dist_sq_ggower[1:5, 1:5], 4))
cat("\nGGower corrected squared distances (5x5 block):\n")
print(round(res_ggower$D_euc[1:5, 1:5], 4))

# -------------------------------
# Method 2: RelMS distances
# -------------------------------
dist_sq_relms <- robust_distances(
  data = Data_small,
  cont_vars = cont_vars,
  bin_vars  = bin_vars,
  cat_vars  = cat_vars,
  w = w,
  alpha = 0.10,
  method = "relms"
)

# Apply Euclidean correction if needed
res_relms <- make_euclidean(dist_sq_relms, w)

# Show first 5x5 block of original and corrected distances
cat("RelMS original squared distances (5x5 block):\n")
print(round(dist_sq_relms[1:5, 1:5], 4))
cat("\nRelMS corrected squared distances (5x5 block):\n")
print(round(res_relms$D_euc[1:5, 1:5], 4))

Run the code above in your browser using DataLab