AutoH2oGBMHurdleModel: AutoH2oGBMHurdleModel

Description

AutoH2oGBMHurdleModel for hurdle modeing

Usage

AutoH2oGBMHurdleModel(
  data,
  ValidationData = NULL,
  TestData = NULL,
  Buckets = 0L,
  TargetColumnName = NULL,
  FeatureColNames = NULL,
  TransformNumericColumns = NULL,
  Distribution = "gaussian",
  SplitRatios = c(0.7, 0.2, 0.1),
  ModelID = "ModelTest",
  Paths = NULL,
  MetaDataPaths = NULL,
  SaveModelObjects = TRUE,
  IfSaveModel = "mojo",
  MaxMem = {     gc()    
    paste0(as.character(floor(as.numeric(system("awk '/MemFree/ {print $2}' /proc/meminfo",
    intern = TRUE))/1e+06)), "G") },
  NThreads = max(1L, parallel::detectCores() - 2L),
  Trees = 1000L,
  GridTune = TRUE,
  MaxModelsInGrid = 1L,
  NumOfParDepPlots = 10L,
  PassInGrid = NULL
)

Arguments

data

Source training data. Do not include a column that has the class labels for the buckets as they are created internally.

ValidationData

Source validation data. Do not include a column that has the class labels for the buckets as they are created internally.

TestData

Souce test data. Do not include a column that has the class labels for the buckets as they are created internally.

Buckets

A numeric vector of the buckets used for subsetting the data. NOTE: the final Bucket value will first create a subset of data that is less than the value and a second one thereafter for data greater than the bucket value.

TargetColumnName

Supply the column name or number for the target variable

FeatureColNames

Supply the column names or number of the features (not included the PrimaryDateColumn)

TransformNumericColumns

Transform numeric column inside the AutoCatBoostRegression() function

Distribution

Set to the distribution of choice based on H2O regression documents.

SplitRatios

Supply vector of partition ratios. For example, c(0.70,0.20,0,10).

ModelID

Define a character name for your models

Paths

The path to your folder where you want your model information saved

MetaDataPaths

A character string of your path file to where you want your model evaluation output saved. If left NULL, all output will be saved to Paths.

SaveModelObjects

Set to TRUE to save the model objects to file in the folders listed in Paths

IfSaveModel

Save as "mojo" or "standard"

MaxMem

Set the maximum memory your system can provide

NThreads

Set the number of threads you want to dedicate to the model building

Trees

Default 1000

GridTune

Set to TRUE if you want to grid tune the models

MaxModelsInGrid

Set to a numeric value for the number of models to try in grid tune

NumOfParDepPlots

Set to pull back N number of partial dependence calibration plots.

PassInGrid

Pass in a grid for changing up the parameter settings for catboost

Value

Returns AutoXGBoostRegression() model objects: VariableImportance.csv, Model, ValidationData.csv, EvalutionPlot.png, EvalutionBoxPlot.png, EvaluationMetrics.csv, ParDepPlots.R a named list of features with partial dependence calibration plots, ParDepBoxPlots.R, GridCollect, and the grid used

Examples

Run this code

# NOT RUN {
Output <- RemixAutoML::AutoH2oGBMHurdleModel(
  data,
  ValidationData = NULL,
  TestData = NULL,
  Buckets = 1L,
  TargetColumnName = "Target_Variable",
  FeatureColNames = 4L:ncol(data),
  TransformNumericColumns = NULL,
  Distribution = "gaussian",
  SplitRatios = c(0.7, 0.2, 0.1),
  MaxMem = {gc();paste0(as.character(floor(as.numeric(system("awk '/MemFree/ {print $2}' /proc/meminfo", intern=TRUE)) / 1000000)),"G")},
  NThreads = max(1L, parallel::detectCores()-2L),
  ModelID = "ModelID",
  Paths = normalizePath("./"),
  MetaDataPaths = NULL,
  SaveModelObjects = TRUE,
  IfSaveModel = "mojo",
  Trees = 1000L,
  GridTune = FALSE,
  MaxModelsInGrid = 1L,
  NumOfParDepPlots = 10L,
  PassInGrid = NULL)
# }

Run the code above in your browser using DataLab

Last chance! 50% off unlimited learning