Learn R Programming

RemixAutoML (version 0.11.0)

AutoWord2VecModeler: Automated word2vec data generation via H2O

Description

This function allows you to automatically build a word2vec model and merge the data onto your supplied dataset

Usage

AutoWord2VecModeler(data, BuildType = "Combined",
  stringCol = c("Text_Col1", "Text_Col2"), KeepStringCol = FALSE,
  model_path = NULL, vects = 100, SaveStopWords = FALSE,
  MinWords = 1, WindowSize = 12, Epochs = 25, StopWords = NULL,
  SaveModel = "standard", Threads = max(1, parallel::detectCores() -
  2), MaxMemory = "28G", SaveOutput = FALSE)

Arguments

data

Source data table to merge vects onto

BuildType

Choose from "individual" or "combined". Individual will build a model for every text column. Combined will build a single model for all columns.

stringCol

A string name for the column to convert via word2vec

KeepStringCol

Set to TRUE if you want to keep the original string column that you convert via word2vec

model_path

A string path to the location where you want the model and metadata stored

vects

The number of vectors to retain from the word2vec model

SaveStopWords

Set to TRUE to save the stop words used

MinWords

For H2O word2vec model

WindowSize

For H2O word2vec model

Epochs

For H2O word2vec model

StopWords

For H2O word2vec model

SaveModel

Set to "standard" to save normally; set to "mojo" to save as mojo. NOTE: while you can save a mojo, I haven't figured out how to score it in the AutoH20Scoring function.

Threads

Number of available threads you want to dedicate to model building

MaxMemory

Amount of memory you want to dedicate to model building

SaveOutput

Set to TRUE to save your models to file

See Also

Other Feature Engineering: AutoDataPartition, AutoTransformationCreate, AutoTransformationScore, CreateCalendarVariables, CreateHolidayVariables, DT_GDL_Feature_Engineering, DummifyDT, GDL_Feature_Engineering, ModelDataPrep, Partial_DT_GDL_Feature_Engineering, Scoring_GDL_Feature_Engineering, TimeSeriesFill

Examples

Run this code
# NOT RUN {
data <- AutoWord2VecModeler(data,
                            BuildType = "individual",
                            stringCol = c("Text_Col1",
                                          "Text_Col2"),
                            KeepStringCol = FALSE,
                            model_path = NULL,
                            vects = 100,
                            SaveStopWords = FALSE,
                            MinWords = 1,
                            WindowSize = 1,
                            Epochs = 25,
                            StopWords = NULL,
                            SaveModel = "standard",
                            Threads = max(1,parallel::detectCores()-2),
                            MaxMemory = "28G",
                            SaveOutput = TRUE)
# }

Run the code above in your browser using DataLab