- param_grid
list
with candidate parameters defining the grid over which a search is done
- data
a gpb.Dataset
object, used for training. Some functions, such as gpb.cv
,
may allow you to pass other types of data like matrix
and then separately supply
label
as a keyword argument.
- params
list
with other parameters not included in param_grid
- num_try_random
integer
with number of random trial on parameter grid. If NULL, a deterministic search is done
- nrounds
number of boosting iterations (= number of trees). This is the most important tuning parameter for boosting. Default = 100
- gp_model
A GPModel
object that contains the random effects (Gaussian process and / or grouped random effects) model
- use_gp_model_for_validation
Boolean (default = TRUE). If TRUE, the gp_model
(Gaussian process and/or random effects) is also used (in addition to the tree model) for calculating
predictions on the validation data. If FALSE, the gp_model
(random effects part) is ignored
for making predictions and only the tree ensemble is used for making predictions for calculating the validation / test error.
- train_gp_model_cov_pars
Boolean (default = TRUE). If TRUE, the covariance parameters
of the gp_model
(Gaussian process and/or random effects) are estimated in every
boosting iterations, otherwise the gp_model
parameters are not estimated.
In the latter case, you need to either estimate them beforehand or provide the values via
the init_cov_pars
parameter when creating the gp_model
- folds
list
provides a possibility to use a list of pre-defined CV folds
(each element must be a vector of test fold's indices). When folds are supplied,
the nfold
and stratified
parameters are ignored.
- nfold
the original dataset is randomly partitioned into nfold
equal size subsamples.
- label
Vector of labels, used if data
is not an gpb.Dataset
- weight
vector of response values. If not NULL, will set to dataset
- obj
objective function, can be character or custom objective function. Examples include
regression
, regression_l1
, huber
,
binary
, lambdarank
, multiclass
, multiclass
- eval
evaluation function(s). This can be a character vector, function, or list with a mixture of
strings and functions.
a. character vector:
If you provide a character vector to this argument, it should contain strings with valid
evaluation metrics.
See
the "metric" section of the parameter documentation
for a list of valid metrics.
b. function:
You can provide a custom evaluation function. This
should accept the keyword arguments preds
and dtrain
and should return a named
list with three elements:
name
: A string with the name of the metric, used for printing
and storing results.
value
: A single number indicating the value of the metric for the
given predictions and true values
higher_better
: A boolean indicating whether higher values indicate a better fit.
For example, this would be FALSE
for metrics like MAE or RMSE.
c. list:
If a list is given, it should only contain character vectors and functions.
These should follow the requirements from the descriptions above.
- verbose_eval
integer
. Whether to display information on the progress of tuning parameter choice.
If None or 0, verbose is of.
If = 1, summary progress information is displayed for every parameter combination.
If >= 2, detailed progress is displayed at every boosting stage for every parameter combination.
- stratified
a boolean
indicating whether sampling of folds should be stratified
by the values of outcome labels.
- init_model
path of model file of gpb.Booster
object, will continue training from this model
- colnames
feature names, if not null, will use this to overwrite the names in dataset
- categorical_feature
categorical features. This can either be a character vector of feature
names or an integer vector with the indices of the features (e.g.
c(1L, 10L)
to say "the first and tenth columns").
- early_stopping_rounds
int. Activates early stopping. Requires at least one validation data
and one metric. When this parameter is non-null,
training will stop if the evaluation of any metric on any validation set
fails to improve for early_stopping_rounds
consecutive boosting rounds.
If training stops early, the returned model will have attribute best_iter
set to the iteration number of the best iteration.
- callbacks
List of callback functions that are applied at each iteration.
- ...
other parameters, see Parameters.rst for more information.