xgb.cv(params = list(), data, nrounds, nfold, label = NULL, missing = NA, prediction = FALSE, showsd = TRUE, metrics = list(), obj = NULL, feval = NULL, stratified = TRUE, folds = NULL, verbose = TRUE, print_every_n = 1L, early_stopping_rounds = NULL, maximize = NULL, callbacks = list(), ...)objective objective function, common ones are
reg:linear linear regression
binary:logistic logistic regression for classification
eta step size of each boosting step
max_depth maximum depth of the tree
nthread number of thread used in training, if not set, all threads are used
See xgb.train for further details.
See also demo/ for walkthrough example in R.
xgb.DMatrix, matrix, or dgCMatrix as the input.nfold equal size subsamples.cb.cv.predict callback.boolean, whether to show standard deviation of cross validationerror binary classification error rate
rmse Rooted mean square error
logloss negative log-likelihood function
auc Area under curve
merror Exact matching error, used to evaluate multi-class classification
list(metric='metric-name', value='metric-value') with given
prediction and dtrain.boolean indicating whether sampling of folds should be stratified
by the values of outcome labels.list provides a possibility to use a list of pre-defined CV folds
(each element must be a vector of test fold's indices). When folds are supplied,
the nfold and stratified parameters are ignored.boolean, print the statistics during the processverbose>0.
Default is 1 which means all messages are printed. This parameter is passed to the
cb.print.evaluation callback.NULL, the early stopping function is not triggered.
If set to an integer k, training with a validation set will stop if the performance
doesn't improve for k rounds.
Setting this parameter engages the cb.early.stop callback.feval and early_stopping_rounds are set,
then this parameter must be set as well.
When it is TRUE, it means the larger the evaluation score the better.
This parameter is passed to the cb.early.stop callback.callbacks. Some of the callbacks are automatically created depending on the
parameters' values. User can provide either existing or their own callback methods in order
to customize the training process.params.xgb.cv.synchronous with the following elements:
call a function call.
params parameters that were passed to the xgboost library. Note that it does not
capture parameters changed by the cb.reset.parameters callback.
callbacks callback functions that were either automatically assigned or
explicitely passed.
evaluation_log evaluation history storead as a data.table with the
first column corresponding to iteration number and the rest corresponding to the
CV-based evaluation means and standard deviations for the training and test CV-sets.
It is created by the cb.evaluation.log callback.
niter number of boosting iterations.
folds the list of CV folds' indices - either those passed through the folds
parameter or randomly generated.
best_iteration iteration number with the best evaluation metric value
(only available with early stopping).
best_ntreelimit the ntreelimit value corresponding to the best iteration,
which could further be used in predict method
(only available with early stopping).
pred CV prediction values available when prediction is set.
It is either vector or matrix (see cb.cv.predict).
models a liost of the CV folds' models. It is only available with the explicit
setting of the cb.cv.predict(save_models = TRUE) callback.
nfold equal size subsamples. Of the nfold subsamples, a single subsample is retained as the validation data for testing the model, and the remaining nfold - 1 subsamples are used as training data.
The cross-validation process is then repeated nrounds times, with each of the nfold subsamples used exactly once as the validation data.
All observations are used for both training and validation.
Adapted from http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29#k-fold_cross-validation
data(agaricus.train, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
cv <- xgb.cv(data = dtrain, nrounds = 3, nthread = 2, nfold = 5, metrics = list("rmse","auc"),
max_depth = 3, eta = 1, objective = "binary:logistic")
print(cv)
print(cv, verbose=TRUE)
Run the code above in your browser using DataLab