mlr_learners_regr.xgboost: Extreme Gradient Boosting Regression Learner

Description

eXtreme Gradient Boosting regression. Calls xgboost::xgb.train() from package xgboost.

To compute on GPUs, you first need to compile xgboost yourself and link against CUDA. See https://xgboost.readthedocs.io/en/stable/build.html#building-with-gpu-support.

Note that using the watchlist parameter directly will lead to problems when wrapping this Learner in a mlr3pipelines GraphLearner as the preprocessing steps will not be applied to the data in the watchlist.

Arguments

Dictionary

This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():

mlr_learners$get("regr.xgboost")
lrn("regr.xgboost")

Meta Information

Task type: “regr”
Predict Types: “response”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3learners, xgboost

Parameters

Id	Type	Default	Levels	Range
alpha	numeric	0		\([0, \infty)\)
approxcontrib	logical	FALSE	TRUE, FALSE	-
base_score	numeric	0.5		\((-\infty, \infty)\)
booster	character	gbtree	gbtree, gblinear, dart	-
callbacks	untyped	list		-
colsample_bylevel	numeric	1		\([0, 1]\)
colsample_bynode	numeric	1		\([0, 1]\)
colsample_bytree	numeric	1		\([0, 1]\)
device	untyped	cpu		-
disable_default_eval_metric	logical	FALSE	TRUE, FALSE	-
early_stopping_rounds	integer	NULL		\([1, \infty)\)
early_stopping_set	character	none	none, train, test	-
eta	numeric	0.3		\([0, 1]\)
eval_metric	untyped	rmse		-
feature_selector	character	cyclic	cyclic, shuffle, random, greedy, thrifty	-
feval	untyped			-
gamma	numeric	0		\([0, \infty)\)
grow_policy	character	depthwise	depthwise, lossguide	-
interaction_constraints	untyped	-		-
iterationrange	untyped	-		-
lambda	numeric	1		\([0, \infty)\)
lambda_bias	numeric	0		\([0, \infty)\)
max_bin	integer	256		\([2, \infty)\)
max_delta_step	numeric	0		\([0, \infty)\)
max_depth	integer	6		\([0, \infty)\)
max_leaves	integer	0		\([0, \infty)\)
maximize	logical	NULL	TRUE, FALSE	-
min_child_weight	numeric	1		\([0, \infty)\)
missing	numeric	NA		\((-\infty, \infty)\)
monotone_constraints	untyped	0		-
normalize_type	character	tree	tree, forest	-
nrounds	integer	-		\([1, \infty)\)
nthread	integer	1		\([1, \infty)\)
ntreelimit	integer	NULL		\([1, \infty)\)
num_parallel_tree	integer	1		\([1, \infty)\)
objective	untyped	reg:squarederror		-
one_drop	logical	FALSE	TRUE, FALSE	-
outputmargin	logical	FALSE	TRUE, FALSE	-
predcontrib	logical	FALSE	TRUE, FALSE	-
predinteraction	logical	FALSE	TRUE, FALSE	-
predleaf	logical	FALSE	TRUE, FALSE	-
print_every_n	integer	1		\([1, \infty)\)
process_type	character	default	default, update	-
rate_drop	numeric	0		\([0, 1]\)
refresh_leaf	logical	TRUE	TRUE, FALSE	-
reshape	logical	FALSE	TRUE, FALSE	-
sampling_method	character	uniform	uniform, gradient_based	-
sample_type	character	uniform	uniform, weighted	-
save_name	untyped			-
save_period	integer	NULL		\([0, \infty)\)
scale_pos_weight	numeric	1		\((-\infty, \infty)\)
seed_per_iteration	logical	FALSE	TRUE, FALSE	-
skip_drop	numeric	0		\([0, 1]\)
strict_shape	logical	FALSE	TRUE, FALSE	-
subsample	numeric	1		\([0, 1]\)
top_k	integer	0		\([0, \infty)\)
training	logical	FALSE	TRUE, FALSE	-
tree_method	character	auto	auto, exact, approx, hist, gpu_hist	-
tweedie_variance_power	numeric	1.5		\([1, 2]\)
updater	untyped	-		-
verbose	integer	1		\([0, 2]\)
watchlist	untyped			-
xgb_model	untyped			-

Early stopping

Early stopping can be used to find the optimal number of boosting rounds. The early_stopping_set parameter controls which set is used to monitor the performance. Set early_stopping_set = "test" to monitor the performance of the model on the test set while training. The test set for early stopping can be set with the "test" row role in the mlr3::Task. Additionally, the range must be set in which the performance must increase with early_stopping_rounds and the maximum number of boosting rounds with nrounds. While resampling, the test set is automatically applied from the mlr3::Resampling. Not that using the test set for early stopping can potentially bias the performance scores. See the section on early stopping in the examples.

Initial parameter values

nrounds:
- Actual default: no default.
- Adjusted default: 1.
- Reason for change: Without a default construction of the learner would error. Just setting a nonsense default to workaround this. nrounds needs to be tuned by the user.
nthread:
- Actual value: Undefined, triggering auto-detection of the number of CPUs.
- Adjusted value: 1.
- Reason for change: Conflicting with parallelization via future.
verbose:
- Actual default: 1.
- Adjusted default: 0.
- Reason for change: Reduce verbosity.

Super classes

mlr3::Learner -> mlr3::LearnerRegr -> LearnerRegrXgboost

Methods

Public methods

Inherited methods

Method `new()`

Creates a new instance of this R6 class.

Usage

LearnerRegrXgboost$new()

Method `importance()`

The importance scores are calculated with xgboost::xgb.importance().

Usage

LearnerRegrXgboost$importance()

Returns

Named numeric().

Method `clone()`

The objects of this class are cloneable with this method.

Usage

LearnerRegrXgboost$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Chen, Tianqi, Guestrin, Carlos (2016). “Xgboost: A scalable tree boosting system.” In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 785--794. ACM. tools:::Rd_expr_doi("10.1145/2939672.2939785").

Examples

Run this code

if (FALSE) {
if (requireNamespace("xgboost", quietly = TRUE)) {
# Define the Learner and set parameter values
learner = lrn("regr.xgboost")
print(learner)

# Define a Task
task = tsk("mtcars")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

# print the model
print(learner$model)

# importance method
if("importance" %in% learner$properties) print(learner$importance)

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
}
}

if (FALSE) {
# Train learner with early stopping on spam data set
task = tsk("mtcars")

# Split task into training and test set
split = partition(task, ratio = 0.8)
task$set_row_roles(split$test, "test")

# Set early stopping parameter
learner = lrn("regr.xgboost",
  nrounds = 100,
  early_stopping_rounds = 10,
  early_stopping_set = "test"
)

# Train learner with early stopping
learner$train(task)
}

Run the code above in your browser using DataLab

Description

Arguments

Dictionary

Meta Information

Parameters

Early stopping

Initial parameter values

Super classes

Methods

Public methods

Method new()

Usage

Method importance()

Usage

Returns

Method clone()

Usage

Arguments

References

See Also

Examples

Method `new()`

Method `importance()`

Method `clone()`