Locality Sensitive Hashing functions for Euclidean distance (Bucketed Random Projection) and Jaccard distance (MinHash).
ft_bucketed_random_projection_lsh(x, input_col, output_col, bucket_length,
  num_hash_tables = 1L, seed = NULL, dataset = NULL,
  uid = random_string("bucketed_random_projection_lsh_"), ...)ft_minhash_lsh(x, input_col, output_col, num_hash_tables = 1L, seed = NULL,
  dataset = NULL, uid = random_string("minhash_lsh_"), ...)
A spark_connection, ml_pipeline, or a tbl_spark.
The name of the input column.
The name of the output column.
The length of each hash bucket, a larger bucket lowers the false negative rate. The number of buckets will be (max L2 norm of input vectors) / bucketLength.
Number of hash tables used in LSH OR-amplification. LSH OR-amplification can be used to reduce the false negative rate. Higher values for this param lead to a reduced false negative rate, at the expense of added computational complexity.
A random seed. Set this value if you need your results to be reproducible across repeated calls.
(Optional) A tbl_spark. If provided, eagerly fit the (estimator)
feature "transformer" against dataset. See details.
A character string used to uniquely identify the feature transformer.
Optional arguments; currently unused.
The object returned depends on the class of x.
spark_connection: When x is a spark_connection, the function returns a ml_transformer,
  a ml_estimator, or one of their subclasses. The object contains a pointer to
  a Spark Transformer or Estimator object and can be used to compose
  Pipeline objects.
ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with
  the transformer or estimator appended to the pipeline.
tbl_spark: When x is a tbl_spark, a transformer is constructed then
  immediately applied to the input tbl_spark, returning a tbl_spark
When dataset is provided for an estimator transformer, the function
  internally calls ml_fit() against dataset. Hence, the methods for
  spark_connection and ml_pipeline will then return a ml_transformer
  and a ml_pipeline with a ml_transformer appended, respectively. When
  x is a tbl_spark, the estimator will be fit against dataset before
  transforming x.
When dataset is not specified, the constructor returns a ml_estimator, and,
  in the case where x is a tbl_spark, the estimator fits against x then
  to obtain a transformer, which is then immediately used to transform x.
See http://spark.apache.org/docs/latest/ml-features.html for more information on the set of transformations available for DataFrame columns in Spark.
ft_lsh_utils
Other feature transformers: ft_binarizer,
  ft_bucketizer,
  ft_chisq_selector,
  ft_count_vectorizer, ft_dct,
  ft_elementwise_product,
  ft_feature_hasher,
  ft_hashing_tf, ft_idf,
  ft_imputer,
  ft_index_to_string,
  ft_interaction,
  ft_max_abs_scaler,
  ft_min_max_scaler, ft_ngram,
  ft_normalizer,
  ft_one_hot_encoder, ft_pca,
  ft_polynomial_expansion,
  ft_quantile_discretizer,
  ft_r_formula,
  ft_regex_tokenizer,
  ft_sql_transformer,
  ft_standard_scaler,
  ft_stop_words_remover,
  ft_string_indexer,
  ft_tokenizer,
  ft_vector_assembler,
  ft_vector_indexer,
  ft_vector_slicer, ft_word2vec