ft_lsh_utils

ml_approx_nearest_neighbors

ml_approx_similarity_join

A fitted LSH model, returned by either <code>ft_minhash_lsh()</code>
or <code>ft_bucketed_random_projection_lsh()</code>.

model

The dataset to search for nearest neighbors of the key.

dataset

Feature vector representing the item to search for.

The maximum number of nearest neighbors.

num_nearest_neighbors

Output column for storing the distance between each result row and the key.

dist_col

dataset_a

dataset_b

The threshold for the distance of row pairs.

threshold

R interface to Apache Spark, a fast and general engine for big data
processing, see <http://spark.apache.org>. This package supports connecting to
local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end,
and provides an interface to Spark's built-in machine learning algorithms.

ft_lsh_utils: Utility functions for LSH models

Description

Usage

Arguments