Learn R Programming

couplr (version 1.0.10)

compute_distances: Compute and Cache Distance Matrix for Reuse

Description

Precomputes a distance matrix between left and right datasets, allowing it to be reused across multiple matching operations with different constraints. This is particularly useful when exploring different matching parameters (max_distance, calipers, methods) without recomputing distances.

Usage

compute_distances(
  left,
  right,
  vars,
  distance = "euclidean",
  weights = NULL,
  scale = FALSE,
  auto_scale = FALSE,
  left_id = "id",
  right_id = "id",
  block_id = NULL
)

Value

An S3 object of class "distance_object" containing:

  • cost_matrix: Numeric matrix of distances

  • left_ids: Character vector of left IDs

  • right_ids: Character vector of right IDs

  • block_id: Block ID column name (if specified)

  • metadata: List with computation details (vars, distance, scale, etc.)

  • original_left: Original left dataset (for later joining)

  • original_right: Original right dataset (for later joining)

Arguments

left

Left dataset (data frame)

right

Right dataset (data frame)

vars

Character vector of variable names to use for distance computation

distance

Distance metric (default: "euclidean")

weights

Optional numeric vector of variable weights

scale

Scaling method: FALSE, "standardize", "range", or "robust"

auto_scale

Apply automatic preprocessing (default: FALSE)

left_id

Name of ID column in left (default: "id")

right_id

Name of ID column in right (default: "id")

block_id

Optional block ID column name for blocked matching

Details

This function computes distances once and stores them in a reusable object. The resulting distance_object can be passed to match_couples() or greedy_couples() instead of providing datasets and variables.

Benefits:

  • Performance: Avoid recomputing distances when trying different constraints

  • Exploration: Quickly test max_distance, calipers, or methods

  • Consistency: Ensures same distances used across comparisons

  • Memory efficient: Can use sparse matrices when many pairs are forbidden

The distance_object stores the original datasets, allowing downstream functions like join_matched() to work seamlessly.

Examples

Run this code
# Compute distances once
left <- data.frame(id = 1:5, age = c(25, 30, 35, 40, 45), income = c(45, 52, 48, 61, 55) * 1000)
right <- data.frame(id = 6:10, age = c(24, 29, 36, 41, 44), income = c(46, 51, 47, 60, 54) * 1000)

dist_obj <- compute_distances(
  left, right,
  vars = c("age", "income"),
  scale = "standardize"
)

# Reuse for different matching strategies
result1 <- match_couples(dist_obj, max_distance = 0.5)
result2 <- match_couples(dist_obj, max_distance = 1.0)
result3 <- greedy_couples(dist_obj, strategy = "sorted")

# All use the same precomputed distances

Run the code above in your browser using DataLab