Learn R Programming

mlr3learners (version 0.6.0)

mlr_learners_regr.xgboost: Extreme Gradient Boosting Regression Learner

Description

eXtreme Gradient Boosting regression. Calls xgboost::xgb.train() from package xgboost.

To compute on GPUs, you first need to compile xgboost yourself and link against CUDA. See https://xgboost.readthedocs.io/en/stable/build.html#building-with-gpu-support.

Note that using the watchlist parameter directly will lead to problems when wrapping this Learner in a mlr3pipelines GraphLearner as the preprocessing steps will not be applied to the data in the watchlist.

Arguments

Dictionary

This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():

mlr_learners$get("regr.xgboost")
lrn("regr.xgboost")

Meta Information

  • Task type: “regr”

  • Predict Types: “response”

  • Feature Types: “logical”, “integer”, “numeric”

  • Required Packages: mlr3, mlr3learners, xgboost

Parameters

IdTypeDefaultLevelsRange
alphanumeric0\([0, \infty)\)
approxcontriblogicalFALSETRUE, FALSE-
base_scorenumeric0.5\((-\infty, \infty)\)
boostercharactergbtreegbtree, gblinear, dart-
callbacksuntypedlist-
colsample_bylevelnumeric1\([0, 1]\)
colsample_bynodenumeric1\([0, 1]\)
colsample_bytreenumeric1\([0, 1]\)
deviceuntypedcpu-
disable_default_eval_metriclogicalFALSETRUE, FALSE-
early_stopping_roundsintegerNULL\([1, \infty)\)
early_stopping_setcharacternonenone, train, test-
etanumeric0.3\([0, 1]\)
eval_metricuntypedrmse-
feature_selectorcharactercycliccyclic, shuffle, random, greedy, thrifty-
fevaluntyped-
gammanumeric0\([0, \infty)\)
grow_policycharacterdepthwisedepthwise, lossguide-
interaction_constraintsuntyped--
iterationrangeuntyped--
lambdanumeric1\([0, \infty)\)
lambda_biasnumeric0\([0, \infty)\)
max_bininteger256\([2, \infty)\)
max_delta_stepnumeric0\([0, \infty)\)
max_depthinteger6\([0, \infty)\)
max_leavesinteger0\([0, \infty)\)
maximizelogicalNULLTRUE, FALSE-
min_child_weightnumeric1\([0, \infty)\)
missingnumericNA\((-\infty, \infty)\)
monotone_constraintsuntyped0-
normalize_typecharactertreetree, forest-
nroundsinteger-\([1, \infty)\)
nthreadinteger1\([1, \infty)\)
ntreelimitintegerNULL\([1, \infty)\)
num_parallel_treeinteger1\([1, \infty)\)
objectiveuntypedreg:squarederror-
one_droplogicalFALSETRUE, FALSE-
outputmarginlogicalFALSETRUE, FALSE-
predcontriblogicalFALSETRUE, FALSE-
predinteractionlogicalFALSETRUE, FALSE-
predleaflogicalFALSETRUE, FALSE-
print_every_ninteger1\([1, \infty)\)
process_typecharacterdefaultdefault, update-
rate_dropnumeric0\([0, 1]\)
refresh_leaflogicalTRUETRUE, FALSE-
reshapelogicalFALSETRUE, FALSE-
sampling_methodcharacteruniformuniform, gradient_based-
sample_typecharacteruniformuniform, weighted-
save_nameuntyped-
save_periodintegerNULL\([0, \infty)\)
scale_pos_weightnumeric1\((-\infty, \infty)\)
seed_per_iterationlogicalFALSETRUE, FALSE-
skip_dropnumeric0\([0, 1]\)
strict_shapelogicalFALSETRUE, FALSE-
subsamplenumeric1\([0, 1]\)
top_kinteger0\([0, \infty)\)
traininglogicalFALSETRUE, FALSE-
tree_methodcharacterautoauto, exact, approx, hist, gpu_hist-
tweedie_variance_powernumeric1.5\([1, 2]\)
updateruntyped--
verboseinteger1\([0, 2]\)
watchlistuntyped-
xgb_modeluntyped-

Early stopping

Early stopping can be used to find the optimal number of boosting rounds. The early_stopping_set parameter controls which set is used to monitor the performance. Set early_stopping_set = "test" to monitor the performance of the model on the test set while training. The test set for early stopping can be set with the "test" row role in the mlr3::Task. Additionally, the range must be set in which the performance must increase with early_stopping_rounds and the maximum number of boosting rounds with nrounds. While resampling, the test set is automatically applied from the mlr3::Resampling. Not that using the test set for early stopping can potentially bias the performance scores. See the section on early stopping in the examples.

Initial parameter values

  • nrounds:

    • Actual default: no default.

    • Adjusted default: 1.

    • Reason for change: Without a default construction of the learner would error. Just setting a nonsense default to workaround this. nrounds needs to be tuned by the user.

  • nthread:

    • Actual value: Undefined, triggering auto-detection of the number of CPUs.

    • Adjusted value: 1.

    • Reason for change: Conflicting with parallelization via future.

  • verbose:

    • Actual default: 1.

    • Adjusted default: 0.

    • Reason for change: Reduce verbosity.

Super classes

mlr3::Learner -> mlr3::LearnerRegr -> LearnerRegrXgboost

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.

Usage

LearnerRegrXgboost$new()


Method importance()

The importance scores are calculated with xgboost::xgb.importance().

Usage

LearnerRegrXgboost$importance()

Returns

Named numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage

LearnerRegrXgboost$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

References

Chen, Tianqi, Guestrin, Carlos (2016). “Xgboost: A scalable tree boosting system.” In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 785--794. ACM. tools:::Rd_expr_doi("10.1145/2939672.2939785").

See Also

Other Learner: mlr_learners_classif.cv_glmnet, mlr_learners_classif.glmnet, mlr_learners_classif.kknn, mlr_learners_classif.lda, mlr_learners_classif.log_reg, mlr_learners_classif.multinom, mlr_learners_classif.naive_bayes, mlr_learners_classif.nnet, mlr_learners_classif.qda, mlr_learners_classif.ranger, mlr_learners_classif.svm, mlr_learners_classif.xgboost, mlr_learners_regr.cv_glmnet, mlr_learners_regr.glmnet, mlr_learners_regr.kknn, mlr_learners_regr.km, mlr_learners_regr.lm, mlr_learners_regr.nnet, mlr_learners_regr.ranger, mlr_learners_regr.svm

Examples

Run this code
if (FALSE) {
if (requireNamespace("xgboost", quietly = TRUE)) {
# Define the Learner and set parameter values
learner = lrn("regr.xgboost")
print(learner)

# Define a Task
task = tsk("mtcars")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

# print the model
print(learner$model)

# importance method
if("importance" %in% learner$properties) print(learner$importance)

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
}
}

if (FALSE) {
# Train learner with early stopping on spam data set
task = tsk("mtcars")

# Split task into training and test set
split = partition(task, ratio = 0.8)
task$set_row_roles(split$test, "test")

# Set early stopping parameter
learner = lrn("regr.xgboost",
  nrounds = 100,
  early_stopping_rounds = 10,
  early_stopping_set = "test"
)

# Train learner with early stopping
learner$train(task)
}

Run the code above in your browser using DataLab