ft_lsh_utils

0th

Percentile

Utility functions for LSH models

Utility functions for LSH models

Usage
ml_approx_nearest_neighbors(model, dataset, key, num_nearest_neighbors,
  dist_col = "distCol")

ml_approx_similarity_join(model, dataset_a, dataset_b, threshold, dist_col = "distCol")

Arguments
model

A fitted LSH model, returned by either ft_minhash_lsh() or ft_bucketed_random_projection_lsh().

dataset

The dataset to search for nearest neighbors of the key.

key

Feature vector representing the item to search for.

num_nearest_neighbors

The maximum number of nearest neighbors.

dist_col

Output column for storing the distance between each result row and the key.

dataset_a

One of the datasets to join.

dataset_b

Another dataset to join.

threshold

The threshold for the distance of row pairs.

Aliases
  • ft_lsh_utils
  • ml_approx_nearest_neighbors
  • ml_approx_similarity_join
Documentation reproduced from package sparklyr, version 1.0.1, License: Apache License 2.0 | file LICENSE

Community examples

Looks like there are no examples yet.