ft_lsh

ft_bucketed_random_projection_lsh

ft_minhash_lsh

A <code>spark_connection</code>, <code>ml_pipeline</code>, or a <code>tbl_spark</code>.

input_col

output_col

The length of each hash bucket, a larger bucket lowers the
false negative rate. The number of buckets will be (max L2 norm of input vectors) /
bucketLength.

bucket_length

Number of hash tables used in LSH OR-amplification. LSH
OR-amplification can be used to reduce the false negative rate. Higher values
for this param lead to a reduced false negative rate, at the expense of added
 computational complexity.

num_hash_tables

A random seed. Set this value if you need your results to be
reproducible across repeated calls.

seed

(Optional) A <code>tbl_spark</code>. If provided, eagerly fit the (estimator)
feature "transformer" against <code>dataset</code>. See details.

dataset

A character string used to uniquely identify the feature transformer.

Optional arguments; currently unused.

Locality Sensitive Hashing functions for Euclidean distance
 (Bucketed Random Projection) and Jaccard distance (MinHash).

R interface to Apache Spark, a fast and general engine for big data
processing, see <http://spark.apache.org>. This package supports connecting to
local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end,
and provides an interface to Spark's built-in machine learning algorithms.

ft_lsh: Feature Transformation -- LSH (Estimator)

Description

Usage

Arguments

Value

Details

See Also