Build an hnswlib nearest neighbor index
hnsw_build(
X,
distance = "euclidean",
M = 16,
ef = 200,
verbose = FALSE,
progress = "bar",
n_threads = 0,
grain_size = 1,
byrow = TRUE
)
an instance of a HnswL2
, HnswCosine
or HnswIp
class.
A numeric matrix of data to search for neighbors. If byrow = TRUE
(the default) then each row of X
is an item to be searched. Otherwise,
each item should be stored in the columns of X
.
Type of distance to calculate. One of:
"l2"
Squared L2, i.e. squared Euclidean.
"euclidean"
Euclidean.
"cosine"
Cosine.
"ip"
Inner product: 1 - sum(ai * bi), i.e. the cosine distance
where the vectors are not normalized. This can lead to negative distances
and other non-metric behavior.
Controls the number of bi-directional links created for each element
during index construction. Higher values lead to better results at the
expense of memory consumption. Typical values are 2 - 100
, but
for most datasets a range of 12 - 48
is suitable. Can't be smaller
than 2.
Size of the dynamic list used during construction. A larger value means a better quality index, but increases build time. Should be an integer value between 1 and the size of the dataset.
If TRUE
, log messages to the console.
defunct and has no effect.
Maximum number of threads to use. The exact number is
determined by grain_size
.
Minimum amount of work to do (rows in X
to add) per
thread. If the number of rows in X
isn't sufficient, then fewer than
n_threads
will be used. This is useful in cases where the overhead
of context switching with too many threads outweighs the gains due to
parallelism.
if TRUE
(the default), this indicates that the items in X
to be indexed are stored in each row. Otherwise, the items are stored in
the columns of X
. Storing items in each column reduces the overhead of
copying data to a form that can be indexed by the hnsw
library.
irism <- as.matrix(iris[, -5])
ann <- hnsw_build(irism)
iris_nn <- hnsw_search(irism, ann, k = 5)
Run the code above in your browser using DataLab