Learn R Programming

dbrobust (version 1.0.0)

calculate_distances: Compute Distance or Similarity Matrices

Description

Computes a distance or similarity matrix between rows of a data frame or matrix, supporting a wide variety of distance metrics.

Usage

calculate_distances(
  x,
  method = "gower",
  output_format = "dist",
  squared = FALSE,
  p = NULL,
  similarity_transform = "linear",
  ...
)

Value

Depending on output_format, returns:

  • dist object (if output_format = "dist")

  • numeric matrix (if output_format = "matrix" or "output_format = similarity")

Arguments

x

A matrix or data.frame. Each row represents an observation.

method

A string specifying the distance/similarity method. Supported:

  • Binary: "jaccard", "dice", "sokal_michener", "russell_rao", "sokal_sneath", "kulczynski","hamming".

  • Categorical: "matching_coefficient".

  • Continuous: "euclidean", "euclidean_standardized", "manhattan", "minkowski", "canberra", "maximum", "cosine", "correlation", "mahalanobis".

  • Mixed: "gower".

output_format

Output format: "dist" (distance object), "matrix" (numeric matrix), or "similarity" (only for binary/categorical/mixed methods).

squared

Logical; if TRUE, returns squared distances (not applied to similarities).

p

Numeric; the power parameter for the Minkowski distance (required if method = "minkowski").

similarity_transform

Character string; if output_format = "similarity", this specifies the formula to convert distances to similarity scores. Supported:

  • "linear" (default): \(s_{ij} = 1 - \delta_{ij}\)

  • "sqrt": \(s_{ij} = 1 - \delta_{ij}^2\)

...

Additional arguments passed to underlying functions.

Details

When output_format = "similarity", the function transforms computed distances into similarity scores using one of the supported transformations.

The similarity transformation options are:

"linear"

Direct inversion of distance: \(s_{ij} = 1 - \delta_{ij}\).

"sqrt"

Squared distance inversion: \(s_{ij} = 1 - \delta_{ij}^2\), which may better preserve Euclidean properties.

See Also

dist for basic distance measures, dist.binary for binary distances, dist for advanced metrics like cosine or correlation

Examples

Run this code
# Load example dataset
data("Data_HC_contamination", package = "dbrobust")
df <- Data_HC_contamination

# --- Quick Example ---
numeric_data <- df[1:10, 1:4]  # subset for speed
d_euclid <- calculate_distances(
  numeric_data,
  method = "euclidean",
  output_format = "matrix"
)
# \donttest{
# Load example dataset
data("Data_HC_contamination", package = "dbrobust")
df <- Data_HC_contamination[1:20,]

# Example 1: Euclidean distance (numeric variables only)
numeric_data <- df[, 1:4]
d_euclid <- calculate_distances(
  numeric_data,
  method = "euclidean",
  output_format = "matrix"
)

# Example 2: Manhattan distance
d_manhattan <- calculate_distances(
  numeric_data,
  method = "manhattan",
  output_format = "matrix"
)

# Example 3: Categorical distance using Matching Coefficient
categorical_data <- df[, 5:7]
d_match <- calculate_distances(
  categorical_data,
  method = "matching_coefficient",
  output_format = "matrix"
)

# Example 4: Mixed data distance using Gower (automatic type detection, asymmetric binary)
d_gower_asym <- calculate_distances(
  df,
  method = "gower",
  output_format = "dist",
  binary_asym = TRUE
)

# Example 5: Minkowski distance with p = 3
d_minkowski <- calculate_distances(
  numeric_data,
  method = "minkowski",
  p = 3,
  output_format = "matrix"
)

# Example 6: Jaccard distance for binary variables
binary_data <- df[, 8:9]
d_jaccard <- calculate_distances(
  binary_data,
  method = "jaccard",
  output_format = "matrix"
)

# Example 7: Mahalanobis distance
d_mahal <- calculate_distances(
  numeric_data,
  method = "mahalanobis",
  output_format = "matrix"
)

# Example 8: Manual selection of variables for Gower distance
continuous_vars <- 1:4
binary_vars <- 8:9
categorical_vars <- 5:7
d_gower_manual <- calculate_distances(
  df,
  method = "gower",
  output_format = "dist",
  continuous_cols = continuous_vars,
  binary_cols = binary_vars,
  categorical_cols = categorical_vars
)
# }

Run the code above in your browser using DataLab