This function calculates the distance between two vectors using the specified distance metric. Distance metrics available are "p-norm", "Chebyshev", "Canberra", "Overlap", "HEOM" (Heterogeneous Euclidean-Overlap Metric), "HVDM" (Heterogeneous Value Difference Metric). Used in the fast_find_neighbors
function of our package.
calculate_distance(x, y, nominal_indices, p_code)
A numeric value representing the distance between the two vectors.
Vector of numeric and/or categorical values.
Vector of numeric and/or categorical values.
Vector indicating which positions in x
and y
contain categorical variables.This distinction is needed because:
For categorical variables, distances are often based on whether values match or not.
Hybrid distance metrics like HEOM and HVDM require knowing which variables are nominal to apply appropriate distance calculations.
Numeric code representing the distance metric to use:
p >= 1: p-norm
p = 0: Chebyshev
p = -1: Canberra
p = -2: Overlap (nominal attributes only)
p = -3: HEOM (Heterogeneous Euclidean-Overlap Metric)
p = -4: HVDM (Heterogeneous Value Difference Metric)
Different distance metrics handle nominal variables differently:
For pure numeric metrics (p >= 1, p = 0, p = -1), nominal features are ignored
For the Overlap metric (p = -2), only nominal features are considered
For HEOM (p = -3), numeric features use normalized Euclidean distance while nominal features use overlap distance (1 if different, 0 if same)
For HVDM (p = -4), a specialized metric combines normalized differences for numeric features and value difference metric for nominal features