Learn R Programming

RemixAutoML (version 0.11.0)

AutoCatBoostSizeFreqDist: AutoCatBoostSizeFreqDist for building size and frequency distributions via quantile regressions

Description

AutoCatBoostSizeFreqDist for building size and frequency distributions via quantile regressions. Size (or severity) and frequency (or count) quantile regressions are build. Use this with the AutoQuantileGibbsSampler function to simulate the joint distribution.

Usage

AutoCatBoostSizeFreqDist(CountData = NULL, SizeData = NULL,
  CountQuantiles = seq(0.1, 0.9, 0.1), SizeQuantiles = seq(0.1, 0.9,
  0.1), AutoTransform = TRUE, DataPartitionRatios = c(0.75, 0.2, 0.05),
  StratifyColumnNames = NULL, NTrees = 1500, TaskType = "GPU",
  EvalMetric = "Quantile", GridTune = FALSE, GridEvalMetric = "mae",
  CountTargetColumnName = NULL, SizeTargetColumnName = NULL,
  CountFeatureColNames = NULL, SizeFeatureColNames = NULL,
  CountIDcols = NULL, SizeIDcols = NULL, ModelIDs = c("CountModel",
  "SizeModel"), MaxModelsGrid = 5, ModelPath = NULL,
  MetaDataPath = NULL, NumOfParDepPlots = 0)

Arguments

CountData

This is your CountData generated from the IntermittentDemandBootStrapper() function

SizeData

This is your SizeData generated from the IntermittentDemandBootStrapper() function

CountQuantiles

The default are deciles, i.e. seq(0.10,0.90,0.10). More granularity the better, but it will take longer to run.

SizeQuantiles

The default are deciles, i.e. seq(0.10,0.90,0.10). More granularity the better, but it will take longer to run.

AutoTransform

Set to FALSE not to have the your target variables automatically transformed for the best normalization.

DataPartitionRatios

The default is c(0.75,0.20,0.05). With CatBoost, you should allocate a decent amount to the validation data (second input). Three inputs are required.

StratifyColumnNames

Specify grouping variables to stratify by

NTrees

Default is 1500. If the best model utilizes all trees, you should consider increasing the argument.

TaskType

The default is set to "GPU". If you do not have a GPU, set it to "CPU".

EvalMetric

Set to "Quantile". Alternative quantile methods may become available in the future.

GridTune

The default is set to FALSE. If you set to TRUE, make sure to specify MaxModelsGrid to a number greater than 1.

GridEvalMetric

The default is set to "mae". Choose from 'poisson', 'mae', 'mape', 'mse', 'msle', 'kl', 'cs', 'r2'.

CountTargetColumnName

Column names or column numbers

SizeTargetColumnName

Column names or column numbers

CountFeatureColNames

Column names or column numbers

SizeFeatureColNames

Column names or column numbers

CountIDcols

Column names or column numbers

SizeIDcols

Column names or column numbers

ModelIDs

A two element character vector. E.g. c("CountModel","SizeModel")

MaxModelsGrid

Set to a number greater than 1 if GridTune is set to TRUE

ModelPath

This path file is where all your models will be stored. If you leave MetaDataPath NULL, the evaluation metadata will also be stored here. If you leave this NULL, the function will not run.

MetaDataPath

A separate path to store the model metadata for evaluation.

NumOfParDepPlots

Set to a number greater than or equal to 1 to see the relationships between your features and targets.

Value

This function does not return anything. It can only store your models and model evaluation metadata to file.

See Also

Other Automated Time Series: AutoCatBoostCARMA, AutoCatBoostFreqSizeScoring, AutoH2oDRFCARMA, AutoH2oGBMCARMA, AutoH2oGBMFreqSizeScoring, AutoH2oGBMSizeFreqDist, AutoTS, AutoXGBoostCARMA, ID_Forecast, ID_SingleLevelGibbsSampler, IntermittentDemandScoringDataGenerator

Examples

Run this code
# NOT RUN {
AutoCatBoostSizeFreqDist(CountData = CountData, 
                         SizeData = SizeData,
                         CountQuantiles = seq(0.10,0.90,0.10), 
                         SizeQuantiles = seq(0.10,0.90,0.10), 
                         AutoTransform = TRUE, 
                         DataPartitionRatios = c(0.75,0.20,0.05),
                         StratifyColumnNames = NULL,
                         NTrees = 1500,
                         TaskType = "GPU",
                         EvalMetric = "Quantile",
                         GridTune = FALSE,
                         GridEvalMetric = "mae",
                         CountTargetColumnName = "Counts",
                         SizeTargetColumnName = "Target_qty",
                         CountFeatureColNames = 2:ncol(CountData),
                         SizeFeatureColNames = 2:ncol(SizeData),
                         CountIDcols = NULL,
                         SizeIDcols = NULL,
                         ModelIDs = c("CountModel","SizeModel"),
                         MaxModelsGrid = 5,
                         ModelPath = getwd(),
                         MetaDataPath = paste0(getwd(),"/ModelMetaData"),
                         NumOfParDepPlots = 1)
# }

Run the code above in your browser using DataLab