ft_word2vec

ml_find_synonyms

A <code>spark_connection</code>, <code>ml_pipeline</code>, or a <code>tbl_spark</code>.

input_col

output_col

The dimension of the code that you want to transform from words. Default: 100

vector_size

The minimum number of times a token must appear to be included in
the word2vec model's vocabulary. Default: 5

min_count

(Spark 2.0.0+) Sets the maximum length (in words) of each sentence
in the input data. Any sentence longer than this threshold will be divided into
chunks of up to <code>max_sentence_length</code> size. Default: 1000

max_sentence_length

Number of partitions for sentences of words. Default: 1

num_partitions

Param for Step size to be used for each iteration of optimization (&gt; 0).

step_size

The maximum number of iterations to use.

max_iter

A random seed. Set this value if you need your results to be
reproducible across repeated calls.

seed

(Optional) A <code>tbl_spark</code>. If provided, eagerly fit the (estimator)
feature "transformer" against <code>dataset</code>. See details.

dataset

A character string used to uniquely identify the feature transformer.

Optional arguments; currently unused.

A fitted <code>Word2Vec</code> model, returned by <code>ft_word2vec()</code>.

model

A word, as a length-one character vector.

word

Number of words closest in similarity to the given word to find.

Word2Vec transforms a word into a code for further natural language processing or machine learning process.

R interface to Apache Spark, a fast and general engine for big data
processing, see <http://spark.apache.org>. This package supports connecting to
local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end,
and provides an interface to Spark's built-in machine learning algorithms.

ft_word2vec: Feature Transformation -- Word2Vec (Estimator)

Description

Usage

Arguments

Value

Details

See Also