AutoCatBoostHurdleModel: AutoCatBoostHurdleModel

Description

AutoCatBoostHurdleModel for generalized hurdle modeling. Check out the Readme.Rd on github for more background.

Usage

AutoCatBoostHurdleModel(
  data = NULL,
  TimeWeights = NULL,
  TrainOnFull = FALSE,
  ValidationData = NULL,
  TestData = NULL,
  Buckets = 0L,
  TargetColumnName = NULL,
  FeatureColNames = NULL,
  PrimaryDateColumn = NULL,
  IDcols = NULL,
  TransformNumericColumns = NULL,
  Methods = c("BoxCox", "Asinh", "Asin", "Log", "LogPlus1", "Logit", "YeoJohnson"),
  ClassWeights = NULL,
  SplitRatios = c(0.7, 0.2, 0.1),
  task_type = "GPU",
  ModelID = "ModelTest",
  Paths = NULL,
  MetaDataPaths = NULL,
  SaveModelObjects = FALSE,
  ReturnModelObjects = TRUE,
  NumOfParDepPlots = 10L,
  PassInGrid = NULL,
  GridTune = FALSE,
  BaselineComparison = "default",
  MaxModelsInGrid = 1L,
  MaxRunsWithoutNewWinner = 20L,
  MaxRunMinutes = 60L * 60L,
  MetricPeriods = 25L,
  Langevin = FALSE,
  DiffusionTemperature = 10000,
  Trees = list(classifier = seq(1000, 2000, 100), regression = seq(1000, 2000, 100)),
  Depth = list(classifier = seq(6, 10, 1), regression = seq(6, 10, 1)),
  RandomStrength = list(classifier = seq(1, 10, 1), regression = seq(1, 10, 1)),
  BorderCount = list(classifier = seq(32, 256, 16), regression = seq(32, 256, 16)),
  LearningRate = list(classifier = seq(0.01, 0.25, 0.01), regression = seq(0.01, 0.25,
    0.01)),
  L2_Leaf_Reg = list(classifier = seq(3, 10, 1), regression = seq(1, 10, 1)),
  RSM = list(classifier = c(0.8, 0.85, 0.9, 0.95, 1), regression = c(0.8, 0.85, 0.9,
    0.95, 1)),
  BootStrapType = list(classifier = c("Bayesian", "Bernoulli", "Poisson", "MVS", "No"),
    regression = c("Bayesian", "Bernoulli", "Poisson", "MVS", "No")),
  GrowPolicy = list(classifier = c("SymmetricTree", "Depthwise", "Lossguide"),
    regression = c("SymmetricTree", "Depthwise", "Lossguide"))
)

Arguments

data

Source training data. Do not include a column that has the class labels for the buckets as they are created internally.

TimeWeights

Supply a value that will be multiplied by he time trend value

TrainOnFull

Set to TRUE to use all data

ValidationData

Source validation data. Do not include a column that has the class labels for the buckets as they are created internally.

TestData

Souce test data. Do not include a column that has the class labels for the buckets as they are created internally.

Buckets

A numeric vector of the buckets used for subsetting the data. NOTE: the final Bucket value will first create a subset of data that is less than the value and a second one thereafter for data greater than the bucket value.

TargetColumnName

Supply the column name or number for the target variable

FeatureColNames

Supply the column names or number of the features (not included the PrimaryDateColumn)

PrimaryDateColumn

Supply a date column if the data is functionally related to it

IDcols

Includes PrimaryDateColumn and any other columns you want returned in the validation data with predictions

TransformNumericColumns

Transform numeric column inside the AutoCatBoostRegression() function

Methods

Choose transformation methods

ClassWeights

Utilize these for the classifier model

SplitRatios

Supply vector of partition ratios. For example, c(0.70,0.20,0,10).

task_type

Set to "GPU" or "CPU"

ModelID

Define a character name for your models

Paths

The path to your folder where you want your model information saved

MetaDataPaths

TA character string of your path file to where you want your model evaluation output saved. If left NULL, all output will be saved to Paths.

SaveModelObjects

Set to TRUE to save the model objects to file in the folders listed in Paths

ReturnModelObjects

TRUE to return the models

NumOfParDepPlots

Set to pull back N number of partial dependence calibration plots.

PassInGrid

Pass in a grid for changing up the parameter settings for catboost

GridTune

Set to TRUE if you want to grid tune the models

BaselineComparison

= "default",

MaxModelsInGrid

= 1L,

MaxRunsWithoutNewWinner

= 20L,

MaxRunMinutes

= 60L*60L,

MetricPeriods

= 25L,

Langevin

TRUE or FALSE

DiffusionTemperature

Default 10000

Trees

Provide a named list to have different number of trees for each model. Trees = list("classifier" = seq(1000,2000,100), "regression" = seq(1000,2000,100))

Depth

= seq(4L, 8L, 1L),

RandomStrength

BorderCount

128

LearningRate

= seq(0.01,0.10,0.01),

L2_Leaf_Reg

= seq(1.0, 10.0, 1.0),

RSM

= c(0.80, 0.85, 0.90, 0.95, 1.0),

BootStrapType

= c("Bayesian", "Bernoulli", "Poisson", "MVS", "No"),

GrowPolicy

= c("SymmetricTree", "Depthwise", "Lossguide")

Shuffles

= 2L,

Value

Returns AutoCatBoostRegression() model objects: VariableImportance.csv, Model, ValidationData.csv, EvalutionPlot.png, EvalutionBoxPlot.png, EvaluationMetrics.csv, ParDepPlots.R a named list of features with partial dependence calibration plots, ParDepBoxPlots.R, GridCollect, and catboostgrid

Examples

Run this code

# NOT RUN {
Output <- RemixAutoML::AutoCatBoostHurdleModel(

  # Operationalization
  task_type = "GPU",
  ModelID = "ModelTest",
  SaveModelObjects = FALSE,
  ReturnModelObjects = TRUE,

  # Data related args
  data = data,
  TimeWeights = NULL,
  TrainOnFull = FALSE,
  ValidationData = NULL,
  TestData = NULL,
  Buckets = 0L,
  TargetColumnName = NULL,
  FeatureColNames = NULL,
  PrimaryDateColumn = NULL,
  IDcols = NULL,

  # Metadata args
  Paths = normalizePath("./"),
  MetaDataPaths = NULL,
  TransformNumericColumns = NULL,
  Methods =
     c("BoxCox", "Asinh", "Asin", "Log",
       "LogPlus1", "Logit", "YeoJohnson"),
  ClassWeights = NULL,
  SplitRatios = c(0.70, 0.20, 0.10),
  NumOfParDepPlots = 10L,

  # Grid tuning setup
  PassInGrid = NULL,
  GridTune = FALSE,
  BaselineComparison = "default",
  MaxModelsInGrid = 1L,
  MaxRunsWithoutNewWinner = 20L,
  MaxRunMinutes = 60L*60L,
  MetricPeriods = 25L,

  # Bandit grid args
  Langevin = FALSE,
  DiffusionTemperature = 10000,
  Trees = list("classifier" = seq(1000,2000,100),
               "regression" = seq(1000,2000,100)),
  Depth = list("classifier" = seq(6,10,1),
               "regression" = seq(6,10,1)),
  RandomStrength = list("classifier" = seq(1,10,1),
                       "regression" = seq(1,10,1)),
  BorderCount = list("classifier" = seq(32,256,16),
                     "regression" = seq(32,256,16)),
  LearningRate = list("classifier" = seq(0.01,0.25,0.01),
                     "regression" = seq(0.01,0.25,0.01)),
  L2_Leaf_Reg = list("classifier" = seq(3.0,10.0,1.0),
                  "regression" = seq(1.0,10.0,1.0)),
  RSM = list("classifier" = c(0.80, 0.85, 0.90, 0.95, 1.0),
             "regression" = c(0.80, 0.85, 0.90, 0.95, 1.0)),
  BootStrapType = list("classifier" = c("Bayesian", "Bernoulli", "Poisson", "MVS", "No"),
                       "regression" = c("Bayesian", "Bernoulli", "Poisson", "MVS", "No")),
  GrowPolicy = list("classifier" = c("SymmetricTree", "Depthwise", "Lossguide"),
                    "regression" = c("SymmetricTree", "Depthwise", "Lossguide")))
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples