AutoXGBoostHurdleModel is generalized hurdle modeling framework
AutoXGBoostHurdleModel(data, ValidationData = NULL, TestData = NULL,
Buckets = 0, TargetColumnName = NULL, FeatureColNames = NULL,
IDcols = NULL, TransformNumericColumns = NULL, SplitRatios = c(0.7,
0.2, 0.1), TreeMethod = "hist", NThreads = max(1,
parallel::detectCores() - 2), ModelID = "ModelTest", Paths = NULL,
MetaDataPaths = NULL, SaveModelObjects = TRUE, Trees = 1000,
GridTune = TRUE, MaxModelsInGrid = 1, NumOfParDepPlots = 10,
PassInGrid = NULL)
Source training data. Do not include a column that has the class labels for the buckets as they are created internally.
Source validation data. Do not include a column that has the class labels for the buckets as they are created internally.
Souce test data. Do not include a column that has the class labels for the buckets as they are created internally.
A numeric vector of the buckets used for subsetting the data. NOTE: the final Bucket value will first create a subset of data that is less than the value and a second one thereafter for data greater than the bucket value.
Supply the column name or number for the target variable
Supply the column names or number of the features (not included the PrimaryDateColumn)
Includes PrimaryDateColumn and any other columns you want returned in the validation data with predictions
Transform numeric column inside the AutoCatBoostRegression() function
Supply vector of partition ratios. For example, c(0.70,0.20,0,10).
Set to hist or gpu_hist depending on if you have an xgboost installation capable of gpu processing
Set to the number of threads you would like to dedicate to training
Define a character name for your models
The path to your folder where you want your model information saved
A character string of your path file to where you want your model evaluation output saved. If left NULL, all output will be saved to Paths.
Set to TRUE to save the model objects to file in the folders listed in Paths
Default 15000
Set to TRUE if you want to grid tune the models
Set to a numeric value for the number of models to try in grid tune
Set to pull back N number of partial dependence calibration plots.
Pass in a grid for changing up the parameter settings for catboost
Returns AutoXGBoostRegression() model objects: VariableImportance.csv, Model, ValidationData.csv, EvalutionPlot.png, EvalutionBoxPlot.png, EvaluationMetrics.csv, ParDepPlots.R a named list of features with partial dependence calibration plots, ParDepBoxPlots.R, GridCollect, and the grid used
Other Automated Regression: AutoCatBoostHurdleModel
,
AutoCatBoostRegression
,
AutoH2oDRFHurdleModel
,
AutoH2oDRFRegression
,
AutoH2oGBMHurdleModel
,
AutoH2oGBMRegression
,
AutoNLS
,
AutoXGBoostRegression
# NOT RUN {
Output <- RemixAutoML::AutoXGBoostHurdleModel(
data,
ValidationData = NULL,
TestData = NULL,
Buckets = 1,
TargetColumnName = "Target_Variable",
FeatureColNames = 4:ncol(data),
IDcols = 1:3,
TransformNumericColumns = NULL,
SplitRatios = c(0.7, 0.2, 0.1),
TreeMethod = "hist",
NThreads = max(1, parallel::detectCores()-2),
ModelID = "ModelID",
Paths = NULL,
MetaDataPaths = NULL,
SaveModelObjects = TRUE,
Trees = 1000,
GridTune = FALSE,
MaxModelsInGrid = 1,
NumOfParDepPlots = 10,
PassInGrid = NULL)
# }
Run the code above in your browser using DataLab