kKNN: Adaptive k-Nearest Neighbor Classifier

Description

Implements the adaptive k-nearest neighbor (kK-NN) algorithm, which adjusts the neighborhood size for each sample based on a local curvature estimate. This method aims to improve classification performance, particularly in datasets with limited training samples.

Usage

kKNN(train, test, train_target, k, func = "log", quantize_method = "paper")

Value

A numeric or factor vector of predicted class labels for the test data.

Arguments

train: A numeric matrix or data frame of the training data.
test: A numeric matrix or data frame of the test data.
train_target: A numeric or factor vector of class labels for the training data.
k: The number of neighbors for the initial k-NN graph.
func: The transformation function for curvatures ('log', 'cubic_root', or 'sigmoid').
quantize_method: The quantization method to use: 'paper' (10 levels, default) or 'log2n' (k levels, where k = log2(n)).

References

Levada, A.L.M., Nielsen, F., Haddad, M.F.C. (2024). ADAPTIVE k-NEAREST NEIGHBOR CLASSIFIER BASED ON THE LOCAL ESTIMATION OF THE SHAPE OPERATOR. arXiv:2409.05084.

Examples

Run this code

# Load necessary libraries
library(caret)

# Load and prepare data (e.g., the Iris dataset)
data_iris <- iris
data <- as.matrix(data_iris[, 1:4])
target <- as.integer(data_iris$Species)

# Standardize the data
data <- scale(data)

# Split data into training and testing sets
set.seed(42)
train_index <- caret::createDataPartition(target, p = 0.5, list = FALSE)
train_data <- data[train_index, ]
test_data <- data[-train_index, ]
train_labels <- target[train_index]

# Determine initial k value as log2(n)
initial_k <- round(log2(nrow(train_data)))
if (initial_k %% 2 == 0) {
   initial_k <- initial_k + 1
}

# Run the kK-NN classifier using the default quantization method ('paper')
predictions_paper <- LCCkNN::kKNN(
   train = train_data,
   test = test_data,
   train_target = train_labels,
   k = initial_k
)

# Run the kK-NN classifier using the 'log2n' quantization method
predictions_log2n <- LCCkNN::kKNN(
   train = train_data,
   test = test_data,
   train_target = train_labels,
   k = initial_k,
   quantize_method = 'log2n'
)

# Evaluate the results (e.g., calculate balanced accuracy)
test_labels <- target[-train_index]
bal_acc_paper <- LCCkNN::balanced_accuracy_score(test_labels, predictions_paper)
bal_acc_log2n <- LCCkNN::balanced_accuracy_score(test_labels, predictions_log2n)
cat("Balanced Accuracy (paper Method):", bal_acc_paper, "\n")
cat("Balanced Accuracy (log2n Method):", bal_acc_log2n, "\n")

Run the code above in your browser using DataLab