h2o.word2vec

training_frame

Destination id for this model; auto-generated if not specified.

model_id

This will discard words that appear less than &lt;int&gt; times Defaults to 5.

min_word_freq

Use the Skip-Gram model Must be one of: "SkipGram". Defaults to SkipGram.

word_model

Use Hierarchical Softmax Must be one of: "HSM". Defaults to HSM.

norm_model

Set size of word vectors Defaults to 100.

vec_size

Set max skip length between words Defaults to 5.

window_size

Set threshold for occurrence of words. Those that appear with higher frequency in the training data
will be randomly down-sampled; useful range is (0, 1e-5) Defaults to 0.001.

sent_sample_rate

Set the starting learning rate Defaults to 0.025.

init_learning_rate

Number of training iterations to run Defaults to 5.

epochs

Id of a data frame that contains a pre-trained (external) word2vec model

pre_trained

Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0.

max_runtime_secs

Automatically export generated models to this directory.

export_checkpoints_dir

Trains a word2vec model on a String column of an H2O data frame

R interface for 'H2O', the scalable open source machine learning
platform that offers parallelized implementations of many supervised and
unsupervised machine learning algorithms such as Generalized Linear
Models, Gradient Boosting Machines (including XGBoost), Random Forests,
Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Cox
Proportional Hazards, K-Means, PCA, Word2Vec, as well as a fully automatic
machine learning algorithm (AutoML).

h2o.word2vec: Trains a word2vec model on a String column of an H2O data frame

Description

Usage

Arguments