xgb.cv(params = list(), data, nrounds, nfold, label = NULL,
missing = NULL, prediction = FALSE, showsd = TRUE, metrics = list(),
obj = NULL, feval = NULL, stratified = TRUE, folds = NULL,
verbose = T, print.every.n = 1L, early.stop.round = NULL,
maximize = NULL, ...)
objective
objective function, common ones arereg:linear
linear regressionbinary:logistic
logistic regression for classificationxgb.DMatrix
or Matrix
as the input.nfold
equal size subsamples.Matrix
boolean
, whether show standard deviation of cross validationerror
binary classification error ratermse
list(metric='metric-name', value='metric-value')
with given
prediction and dtrain.boolean
whether sampling of folds should be stratified by the values of labels in data
list
provides a possibility of using a list of pre-defined CV folds (each element must be a vector of fold's indices).
If folds are supplied, the nfold and stratified parameters would be ignored.boolean
, print the statistics during the processverbose>0
. Default is 1 which means all messages are printed.NULL
, the early stopping function is not triggered.
If set to an integer k
, training with a validation set will stop if the performance
keeps getting worse consecutively for k
rounds.feval
and early.stop.round
are set, then maximize
must be set as well.
maximize=TRUE
means the larger the evaluation score the better.params
.prediction = TRUE
, a list with the following elements is returned:
dt
adata.table
with each mean and standard deviation stat for training set and test setpred
an array or matrix (for multiclass classification) with predictions for each CV-fold for the model having been trained on the data in all other folds.If prediction = FALSE
, just a data.table
with each mean and standard deviation stat for training set and test set is returned.
nfold
equal size subsamples.Of the nfold
subsamples, a single subsample is retained as the validation data for testing the model, and the remaining nfold - 1
subsamples are used as training data.
The cross-validation process is then repeated nrounds
times, with each of the nfold
subsamples used exactly once as the validation data.
All observations are used for both training and validation.
Adapted from
data(agaricus.train, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
history <- xgb.cv(data = dtrain, nround=3, nthread = 2, nfold = 5, metrics=list("rmse","auc"),
max.depth =3, eta = 1, objective = "binary:logistic")
print(history)
Run the code above in your browser using DataCamp Workspace