hnsw_build: Build an hnswlib nearest neighbor index

Description

Build an hnswlib nearest neighbor index

Usage

hnsw_build(
  X,
  distance = "euclidean",
  M = 16,
  ef = 200,
  verbose = FALSE,
  progress = "bar",
  n_threads = 0,
  grain_size = 1,
  byrow = TRUE
)

Value

an instance of a HnswL2, HnswCosine or HnswIp class.

Arguments

X

A numeric matrix of data to search for neighbors. If byrow = TRUE (the default) then each row of X is an item to be searched. Otherwise, each item should be stored in the columns of X.

distance

Type of distance to calculate. One of:

"l2" Squared L2, i.e. squared Euclidean.
"euclidean" Euclidean.
"cosine" Cosine.
"ip" Inner product: 1 - sum(ai * bi), i.e. the cosine distance where the vectors are not normalized. This can lead to negative distances and other non-metric behavior.

M

Controls the number of bi-directional links created for each element during index construction. Higher values lead to better results at the expense of memory consumption. Typical values are 2 - 100, but for most datasets a range of 12 - 48 is suitable. Can't be smaller than 2.

ef

Size of the dynamic list used during construction. A larger value means a better quality index, but increases build time. Should be an integer value between 1 and the size of the dataset.

verbose

If TRUE, log messages to the console.

progress

defunct and has no effect.

n_threads

Maximum number of threads to use. The exact number is determined by grain_size.

grain_size

Minimum amount of work to do (rows in X to add) per thread. If the number of rows in X isn't sufficient, then fewer than n_threads will be used. This is useful in cases where the overhead of context switching with too many threads outweighs the gains due to parallelism.

byrow

if TRUE (the default), this indicates that the items in X to be indexed are stored in each row. Otherwise, the items are stored in the columns of X. Storing items in each column reduces the overhead of copying data to a form that can be indexed by the hnsw library.

Examples

Run this code

irism <- as.matrix(iris[, -5])
ann <- hnsw_build(irism)
iris_nn <- hnsw_search(irism, ann, k = 5)

Run the code above in your browser using DataLab