
AutoXGBoostdHurdleModel is generalized hurdle modeling framework
AutoXGBoostdHurdleModel(data, ValidationData = NULL, TestData = NULL,
Buckets = 0, TargetColumnName = NULL, FeatureColNames = NULL,
IDcols = NULL, TransformNumericColumns = NULL, ClassWeights = NULL,
SplitRatios = c(0.7, 0.2, 0.1), TreeMethod = "hist",
NThreads = max(1, parallel::detectCores() - 2),
ModelID = "ModelTest", Paths = NULL, SaveModelObjects = TRUE,
Trees = 1000, GridTune = TRUE, MaxModelsInGrid = 1,
NumOfParDepPlots = 10, PassInGrid = NULL)
Source training data. Do not include a column that has the class labels for the buckets as they are created internally.
Source validation data. Do not include a column that has the class labels for the buckets as they are created internally.
Souce test data. Do not include a column that has the class labels for the buckets as they are created internally.
A numeric vector of the buckets used for subsetting the data. NOTE: the final Bucket value will first create a subset of data that is less than the value and a second one thereafter for data greater than the bucket value.
Supply the column name or number for the target variable
Supply the column names or number of the features (not included the PrimaryDateColumn)
Includes PrimaryDateColumn and any other columns you want returned in the validation data with predictions
Transform numeric column inside the AutoCatBoostRegression() function
Utilize these for the classifier model
Supply vector of partition ratios. For example, c(0.70,0.20,0,10).
Set to hist or gpu_hist depending on if you have an xgboost installation capable of gpu processing
Set to the number of threads you would like to dedicate to training
Define a character name for your models
A character vector of the path file strings. EITHER SUPPLY 1 file path or N file paths for N models
Set to TRUE to save the model objects to file in the folders listed in Paths
Default 15000
Set to TRUE if you want to grid tune the models
Set to a numeric value for the number of models to try in grid tune
Set to pull back N number of partial dependence calibration plots.
Pass in a grid for changing up the parameter settings for catboost
Returns AutoXGBoostRegression() model objects: VariableImportance.csv, Model, ValidationData.csv, EvalutionPlot.png, EvalutionBoxPlot.png, EvaluationMetrics.csv, ParDepPlots.R a named list of features with partial dependence calibration plots, ParDepBoxPlots.R, GridCollect, and the grid used
Other Supervised Learning: AutoCatBoostClassifier
,
AutoCatBoostMultiClass
,
AutoCatBoostRegression
,
AutoCatBoostScoring
,
AutoCatBoostdHurdleModel
,
AutoH2OMLScoring
,
AutoH2OModeler
,
AutoH2OScoring
,
AutoH2oDRFClassifier
,
AutoH2oDRFMultiClass
,
AutoH2oDRFRegression
,
AutoH2oGBMClassifier
,
AutoH2oGBMMultiClass
,
AutoH2oGBMRegression
,
AutoNLS
,
AutoXGBoostClassifier
,
AutoXGBoostMultiClass
,
AutoXGBoostRegression
,
AutoXGBoostScoring
# NOT RUN {
Output <- RemixAutoML::AutoXGBoostdHurdleModel(
data,
ValidationData = NULL,
TestData = NULL,
Buckets = 1,
TargetColumnName = "Target_Variable",
FeatureColNames = 4:ncol(data),
IDcols = 1:3,
TransformNumericColumns = NULL,
ClassWeights = NULL,
SplitRatios = c(0.7, 0.2, 0.1),
TreeMethod = "hist",
NThreads = max(1, parallel::detectCores()-2),
ModelID = "ModelID",
Paths = NULL,
SaveModelObjects = TRUE,
Trees = 1000,
GridTune = FALSE,
MaxModelsInGrid = 1,
NumOfParDepPlots = 10,
PassInGrid = NULL)
# }
Run the code above in your browser using DataLab