hnsw_search: Search an hnswlib nearest neighbor index

Description

Search an hnswlib nearest neighbor index

Usage

hnsw_search(
  X,
  ann,
  k,
  ef = 10,
  verbose = FALSE,
  progress = "bar",
  n_threads = 0,
  grain_size = 1,
  byrow = TRUE
)

Value

a list containing:

idx a matrix containing the nearest neighbor indices.
dist a matrix containing the nearest neighbor distances.

The dimensions of the matrices respect the storage (row or column-based) of X as indicated by the byrow parameter. If byrow = TRUE (the default) each row of idx and dist contain the neighbor information for the item passed in the equivalent row of X, i.e. the dimensions are n x k where n is the number of items in X. If byrow = FALSE, then each column of idx and dist contain the neighbor information for the item passed in the equivalent column of X, i.e. the dimensions are k x n.

Every item in the dataset is considered to be a neighbor of itself, so the first neighbor of item i should always be i itself. If that isn't the case, then any of M or ef may need increasing.

Arguments

X: A numeric matrix of data to search for neighbors. If byrow = TRUE (the default) then each row of X is an item to be searched. Otherwise, each item should be stored in the columns of X.
ann: an instance of a HnswL2, HnswCosine or HnswIp class.
k: Number of neighbors to return. This can't be larger than the number of items that were added to the index ann. To check the size of the index, call ann$size().
ef: Size of the dynamic list used during search. Higher values lead to improved recall at the expense of longer search time. Can take values between k and the size of the dataset. Typical values are 100 - 2000.
verbose: If TRUE, log messages to the console.
progress: defunct and has no effect.
n_threads: Maximum number of threads to use. The exact number is determined by grain_size.
grain_size: Minimum amount of work to do (items in X to search) per thread. If the number of items in X isn't sufficient, then fewer than n_threads will be used. This is useful in cases where the overhead of context switching with too many threads outweighs the gains due to parallelism.
byrow: if TRUE (the default), this indicates that the items to be searched in X are stored in each row of X. Otherwise, the items are stored in the columns of X. Storing items in each column reduces the overhead of copying data to a form that can be searched by the hnsw library. Note that if byrow = FALSE, any matrices returned from this function will also store the items by column.

Examples

Run this code

irism <- as.matrix(iris[, -5])
ann <- hnsw_build(irism)
iris_nn <- hnsw_search(irism, ann, k = 5)

Run the code above in your browser using DataLab