ft_lsh_utils
From sparklyr v0.9.4
by Javier Luraschi
Utility functions for LSH models
Utility functions for LSH models
Usage
ml_approx_nearest_neighbors(model, dataset, key, num_nearest_neighbors,
dist_col = "distCol")ml_approx_similarity_join(model, dataset_a, dataset_b, threshold,
dist_col = "distCol")
Arguments
- model
A fitted LSH model, returned by either
ft_minhash_lsh()
orft_bucketed_random_projection_lsh()
.- dataset
The dataset to search for nearest neighbors of the key.
- key
Feature vector representing the item to search for.
- num_nearest_neighbors
The maximum number of nearest neighbors.
- dist_col
Output column for storing the distance between each result row and the key.
- dataset_a
One of the datasets to join.
- dataset_b
Another dataset to join.
- threshold
The threshold for the distance of row pairs.
Community examples
Looks like there are no examples yet.