xgb.cv(params = list(), data, nrounds, nfold, label = NULL,
missing = NULL, prediction = FALSE, showsd = TRUE, metrics = list(),
obj = NULL, feval = NULL, stratified = TRUE, folds = NULL,
verbose = T, print.every.n = 1L, early.stop.round = NULL,
maximize = NULL, ...)objectiveobjective function, common ones arereg:linearlinear regressionbinary:logisticlogistic regression for classificationxgb.DMatrix or Matrix as the input.nfold equal size subsamples.Matrixboolean, whether show standard deviation of cross validationerrorbinary classification error ratermselist(metric='metric-name', value='metric-value') with given
prediction and dtrain.boolean whether sampling of folds should be stratified by the values of labels in datalist provides a possibility of using a list of pre-defined CV folds (each element must be a vector of fold's indices).
If folds are supplied, the nfold and stratified parameters would be ignored.boolean, print the statistics during the processverbose>0. Default is 1 which means all messages are printed.NULL, the early stopping function is not triggered.
If set to an integer k, training with a validation set will stop if the performance
keeps getting worse consecutively for k rounds.feval and early.stop.round are set, then maximize must be set as well.
maximize=TRUE means the larger the evaluation score the better.params.prediction = TRUE, a list with the following elements is returned:
dtadata.tablewith each mean and standard deviation stat for training set and test setpredan array or matrix (for multiclass classification) with predictions for each CV-fold for the model having been trained on the data in all other folds.If prediction = FALSE, just a data.table with each mean and standard deviation stat for training set and test set is returned.
nfold equal size subsamples.Of the nfold subsamples, a single subsample is retained as the validation data for testing the model, and the remaining nfold - 1 subsamples are used as training data.
The cross-validation process is then repeated nrounds times, with each of the nfold subsamples used exactly once as the validation data.
All observations are used for both training and validation.
Adapted from
data(agaricus.train, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
history <- xgb.cv(data = dtrain, nround=3, nthread = 2, nfold = 5, metrics=list("rmse","auc"),
max.depth =3, eta = 1, objective = "binary:logistic")
print(history)Run the code above in your browser using DataLab