Search an hnswlib nearest neighbor index
hnsw_search(
X,
ann,
k,
ef = 10,
verbose = FALSE,
progress = "bar",
n_threads = 0,
grain_size = 1,
byrow = TRUE
)
a list containing:
idx
a matrix containing the nearest neighbor indices.
dist
a matrix containing the nearest neighbor distances.
The dimensions of the matrices respect the storage (row or column-based) of
X
as indicated by the byrow
parameter. If byrow = TRUE
(the default)
each row of idx
and dist
contain the neighbor information for the item
passed in the equivalent row of X
, i.e. the dimensions are n x k
where
n
is the number of items in X
. If byrow = FALSE
, then each column of
idx
and dist
contain the neighbor information for the item passed in
the equivalent column of X
, i.e. the dimensions are k x n
.
Every item in the dataset is considered to be a neighbor of itself, so the
first neighbor of item i
should always be i
itself. If that isn't the
case, then any of M
or ef
may need increasing.
A numeric matrix of data to search for neighbors. If byrow = TRUE
(the default) then each row of X
is an item to be searched. Otherwise,
each item should be stored in the columns of X
.
an instance of a HnswL2
, HnswCosine
or HnswIp
class.
Number of neighbors to return. This can't be larger than the number
of items that were added to the index ann
. To check the size of the
index, call ann$size()
.
Size of the dynamic list used during search. Higher values lead
to improved recall at the expense of longer search time. Can take values
between k
and the size of the dataset. Typical values are
100 - 2000
.
If TRUE
, log messages to the console.
defunct and has no effect.
Maximum number of threads to use. The exact number is
determined by grain_size
.
Minimum amount of work to do (items in X
to search)
per thread. If the number of items in X
isn't sufficient, then fewer
than n_threads
will be used. This is useful in cases where the
overhead of context switching with too many threads outweighs the gains due
to parallelism.
if TRUE
(the default), this indicates that the items to be
searched in X
are stored in each row of X
. Otherwise, the items are
stored in the columns of X
. Storing items in each column reduces the
overhead of copying data to a form that can be searched by the hnsw
library. Note that if byrow = FALSE
, any matrices returned from this
function will also store the items by column.
irism <- as.matrix(iris[, -5])
ann <- hnsw_build(irism)
iris_nn <- hnsw_search(irism, ann, k = 5)
Run the code above in your browser using DataLab