compute_distances

Precomputes a distance matrix between left and right datasets, allowing
it to be reused across multiple matching operations with different
constraints. This is particularly useful when exploring different matching
parameters (max_distance, calipers, methods) without recomputing distances.

Solves optimal pairing and matching problems using linear assignment
algorithms. Provides implementations of the Hungarian method (Kuhn 1955)
<doi:10.1002/nav.3800020109>, Jonker-Volgenant shortest path algorithm
(Jonker and Volgenant 1987) <doi:10.1007/BF02278710>, Auction algorithm
(Bertsekas 1988) <doi:10.1007/BF02186476>, cost-scaling
(Goldberg and Kennedy 1995) <doi:10.1007/BF01585996>, scaling algorithms
(Gabow and Tarjan 1989) <doi:10.1137/0218069>, push-relabel (Goldberg and
Tarjan 1988) <doi:10.1145/48014.61051>, and Sinkhorn entropy-regularized
transport (Cuturi 2013) <doi:10.48550/arxiv.1306.0895>. Designed for
matching plots, sites, samples, or any pairwise optimization problem.
Supports rectangular matrices, forbidden assignments, data frame inputs,
batch solving, k-best solutions, and pixel-level image morphing for
visualization. Includes automatic preprocessing with variable health
checks, multiple scaling methods (standardized, range, robust), greedy
matching algorithms, and comprehensive balance diagnostics for assessing
match quality using standardized differences and distribution comparisons.

Gilles Colling

couplr

Optimal Pairing and Matching via Linear Assignment

compute_distances function

<dl><dt>left</dt>
<dd>Left dataset (data frame)</dd>
<dt>right</dt>
<dd>Right dataset (data frame)</dd>
<dt>vars</dt>
<dd>Character vector of variable names to use for distance computation</dd>
<dt>distance</dt>
<dd>Distance metric (default: "euclidean")</dd>
<dt>weights</dt>
<dd>Optional numeric vector of variable weights</dd>
<dt>scale</dt>
<dd>Scaling method: FALSE, "standardize", "range", or "robust"</dd>
<dt>auto_scale</dt>
<dd>Apply automatic preprocessing (default: FALSE)</dd>
<dt>left_id</dt>
<dd>Name of ID column in left (default: "id")</dd>
<dt>right_id</dt>
<dd>Name of ID column in right (default: "id")</dd>
<dt>block_id</dt>
<dd>Optional block ID column name for blocked matching</dd></dl>

Arguments

Compute and Cache Distance Matrix for Reuse — compute_distances

<dl>

<dt>left</dt>
<dd>Left dataset (data frame)</dd>


<dt>right</dt>
<dd>Right dataset (data frame)</dd>


<dt>vars</dt>
<dd>Character vector of variable names to use for distance computation</dd>


<dt>distance</dt>
<dd>Distance metric (default: "euclidean")</dd>


<dt>weights</dt>
<dd>Optional numeric vector of variable weights</dd>


<dt>scale</dt>
<dd>Scaling method: FALSE, "standardize", "range", or "robust"</dd>


<dt>auto_scale</dt>
<dd>Apply automatic preprocessing (default: FALSE)</dd>


<dt>left_id</dt>
<dd>Name of ID column in left (default: "id")</dd>


<dt>right_id</dt>
<dd>Name of ID column in right (default: "id")</dd>


<dt>block_id</dt>
<dd>Optional block ID column name for blocked matching</dd>

</dl>

compute_distances: Compute and Cache Distance Matrix for Reuse

Description

Usage

Value

Arguments

Details

Examples