
Last chance! 50% off unlimited learning
Sale ends in
This function allows you to automatically build a word2vec model and merge the data onto your supplied dataset
AutoWord2VecModeler(
data,
BuildType = "Combined",
stringCol = c("Text_Col1", "Text_Col2"),
KeepStringCol = FALSE,
model_path = NULL,
vects = 100,
MinWords = 1,
WindowSize = 12,
Epochs = 25,
SaveModel = "standard",
Threads = max(1L, parallel::detectCores() - 2L),
MaxMemory = "28G",
ModelID = "Model_1"
)
Source data table to merge vects onto
Choose from "individual" or "combined". Individual will build a model for every text column. Combined will build a single model for all columns.
A string name for the column to convert via word2vec
Set to TRUE if you want to keep the original string column that you convert via word2vec
A string path to the location where you want the model and metadata stored
The number of vectors to retain from the word2vec model
For H2O word2vec model
For H2O word2vec model
For H2O word2vec model
Set to "standard" to save normally; set to "mojo" to save as mojo. NOTE: while you can save a mojo, I haven't figured out how to score it in the AutoH20Scoring function.
Number of available threads you want to dedicate to model building
Amount of memory you want to dedicate to model building
Name for saving to file
Other Feature Engineering:
AutoDataPartition()
,
AutoDiffLagN()
,
AutoHierarchicalFourier()
,
AutoInteraction()
,
AutoLagRollStatsScoring()
,
AutoLagRollStats()
,
AutoTransformationCreate()
,
AutoTransformationScore()
,
AutoWord2VecScoring()
,
ContinuousTimeDataGenerator()
,
CreateCalendarVariables()
,
CreateHolidayVariables()
,
DT_GDL_Feature_Engineering()
,
DifferenceDataReverse()
,
DifferenceData()
,
DummifyDT()
,
H2OAutoencoderScoring()
,
H2OAutoencoder()
,
ModelDataPrep()
,
Partial_DT_GDL_Feature_Engineering()
,
TimeSeriesFill()
# NOT RUN {
# Create fake data
data <- RemixAutoML::FakeDataGenerator(
Correlation = 0.70,
N = 1000L,
ID = 2L,
FactorCount = 2L,
AddDate = TRUE,
AddComment = TRUE,
ZIP = 2L,
TimeSeries = FALSE,
ChainLadderData = FALSE,
Classification = FALSE,
MultiClass = FALSE)
# Create Model and Vectors
data <- RemixAutoML::AutoWord2VecModeler(
data,
BuildType = "individual",
stringCol = c("Comment"),
KeepStringCol = FALSE,
ModelID = "Model_1",
model_path = getwd(),
vects = 10,
MinWords = 1,
WindowSize = 1,
Epochs = 25,
SaveModel = "standard",
Threads = max(1,parallel::detectCores()-2),
MaxMemory = "28G")
# Remove data
rm(data)
# Create fake data for mock scoring
data <- RemixAutoML::FakeDataGenerator(
Correlation = 0.70,
N = 1000L,
ID = 2L,
FactorCount = 2L,
AddDate = TRUE,
AddComment = TRUE,
ZIP = 2L,
TimeSeries = FALSE,
ChainLadderData = FALSE,
Classification = FALSE,
MultiClass = FALSE)
# Create vectors for scoring
data <- RemixAutoML::AutoWord2VecScoring(
data,
BuildType = "individual",
ModelObject = NULL,
ModelID = "Model_1",
model_path = getwd(),
stringCol = "Comment",
KeepStringCol = FALSE,
H2OStartUp = TRUE,
H2OShutdown = TRUE,
Threads = max(1L, parallel::detectCores() - 2L),
MaxMemory = "28G")
# }
Run the code above in your browser using DataLab