ft_lsh_utils
From sparklyr v1.5.0
by Yitao Li
Utility functions for LSH models
Utility functions for LSH models
Usage
ml_approx_nearest_neighbors(
model,
dataset,
key,
num_nearest_neighbors,
dist_col = "distCol"
)ml_approx_similarity_join(
model,
dataset_a,
dataset_b,
threshold,
dist_col = "distCol"
)
Arguments
- model
A fitted LSH model, returned by either
ft_minhash_lsh()
orft_bucketed_random_projection_lsh()
.- dataset
The dataset to search for nearest neighbors of the key.
- key
Feature vector representing the item to search for.
- num_nearest_neighbors
The maximum number of nearest neighbors.
- dist_col
Output column for storing the distance between each result row and the key.
- dataset_a
One of the datasets to join.
- dataset_b
Another dataset to join.
- threshold
The threshold for the distance of row pairs.
Community examples
Looks like there are no examples yet.