Learn R Programming

MSclassifR (version 0.4.0)

calculate_distance: Function calculating the distance between two vectors.

Description

This function calculates the distance between two vectors using the specified distance metric. Distance metrics available are "p-norm", "Chebyshev", "Canberra", "Overlap", "HEOM" (Heterogeneous Euclidean-Overlap Metric), "HVDM" (Heterogeneous Value Difference Metric). Used in the fast_find_neighbors function of our package.

Usage

calculate_distance(x, y, nominal_indices, p_code)

Value

A numeric value representing the distance between the two vectors.

Arguments

x

Vector of numeric and/or categorical values.

y

Vector of numeric and/or categorical values.

nominal_indices

Vector indicating which positions in x and y contain categorical variables.This distinction is needed because:

  • For categorical variables, distances are often based on whether values match or not.

  • Hybrid distance metrics like HEOM and HVDM require knowing which variables are nominal to apply appropriate distance calculations.

p_code

Numeric code representing the distance metric to use:

  • p >= 1: p-norm

  • p = 0: Chebyshev

  • p = -1: Canberra

  • p = -2: Overlap (nominal attributes only)

  • p = -3: HEOM (Heterogeneous Euclidean-Overlap Metric)

  • p = -4: HVDM (Heterogeneous Value Difference Metric)

Details

Different distance metrics handle nominal variables differently:

  • For pure numeric metrics (p >= 1, p = 0, p = -1), nominal features are ignored

  • For the Overlap metric (p = -2), only nominal features are considered

  • For HEOM (p = -3), numeric features use normalized Euclidean distance while nominal features use overlap distance (1 if different, 0 if same)

  • For HVDM (p = -4), a specialized metric combines normalized differences for numeric features and value difference metric for nominal features