Last chance! 50% off unlimited learning
Sale ends in
AutoH2oGBMHurdleModel for hurdle modeing
AutoH2oGBMHurdleModel(
data,
ValidationData = NULL,
TestData = NULL,
Buckets = 0L,
TargetColumnName = NULL,
FeatureColNames = NULL,
TransformNumericColumns = NULL,
Distribution = "gaussian",
SplitRatios = c(0.7, 0.2, 0.1),
ModelID = "ModelTest",
Paths = NULL,
MetaDataPaths = NULL,
SaveModelObjects = TRUE,
IfSaveModel = "mojo",
MaxMem = { gc()
paste0(as.character(floor(as.numeric(system("awk '/MemFree/ {print $2}' /proc/meminfo",
intern = TRUE))/1e+06)), "G") },
NThreads = max(1L, parallel::detectCores() - 2L),
Trees = 1000L,
GridTune = TRUE,
MaxModelsInGrid = 1L,
NumOfParDepPlots = 10L,
PassInGrid = NULL
)
Source training data. Do not include a column that has the class labels for the buckets as they are created internally.
Source validation data. Do not include a column that has the class labels for the buckets as they are created internally.
Souce test data. Do not include a column that has the class labels for the buckets as they are created internally.
A numeric vector of the buckets used for subsetting the data. NOTE: the final Bucket value will first create a subset of data that is less than the value and a second one thereafter for data greater than the bucket value.
Supply the column name or number for the target variable
Supply the column names or number of the features (not included the PrimaryDateColumn)
Transform numeric column inside the AutoCatBoostRegression() function
Set to the distribution of choice based on H2O regression documents.
Supply vector of partition ratios. For example, c(0.70,0.20,0,10).
Define a character name for your models
The path to your folder where you want your model information saved
A character string of your path file to where you want your model evaluation output saved. If left NULL, all output will be saved to Paths.
Set to TRUE to save the model objects to file in the folders listed in Paths
Save as "mojo" or "standard"
Set the maximum memory your system can provide
Set the number of threads you want to dedicate to the model building
Default 1000
Set to TRUE if you want to grid tune the models
Set to a numeric value for the number of models to try in grid tune
Set to pull back N number of partial dependence calibration plots.
Pass in a grid for changing up the parameter settings for catboost
Returns AutoXGBoostRegression() model objects: VariableImportance.csv, Model, ValidationData.csv, EvalutionPlot.png, EvalutionBoxPlot.png, EvaluationMetrics.csv, ParDepPlots.R a named list of features with partial dependence calibration plots, ParDepBoxPlots.R, GridCollect, and the grid used
Other Supervised Learning - Compound:
AutoCatBoostHurdleModel()
,
AutoCatBoostSizeFreqDist()
,
AutoH2oDRFHurdleModel()
,
AutoH2oGBMSizeFreqDist()
,
AutoXGBoostHurdleModel()
# NOT RUN {
Output <- RemixAutoML::AutoH2oGBMHurdleModel(
data,
ValidationData = NULL,
TestData = NULL,
Buckets = 1L,
TargetColumnName = "Target_Variable",
FeatureColNames = 4L:ncol(data),
TransformNumericColumns = NULL,
Distribution = "gaussian",
SplitRatios = c(0.7, 0.2, 0.1),
MaxMem = {gc();paste0(as.character(floor(as.numeric(system("awk '/MemFree/ {print $2}' /proc/meminfo", intern=TRUE)) / 1000000)),"G")},
NThreads = max(1L, parallel::detectCores()-2L),
ModelID = "ModelID",
Paths = normalizePath("./"),
MetaDataPaths = NULL,
SaveModelObjects = TRUE,
IfSaveModel = "mojo",
Trees = 1000L,
GridTune = FALSE,
MaxModelsInGrid = 1L,
NumOfParDepPlots = 10L,
PassInGrid = NULL)
# }
Run the code above in your browser using DataLab