xgb.cv
From xgboost v0.4-2
by Tong He
Cross Validation
The cross valudation function of xgboost
Usage
xgb.cv(params = list(), data, nrounds, nfold, label = NULL,
missing = NULL, prediction = FALSE, showsd = TRUE, metrics = list(),
obj = NULL, feval = NULL, stratified = TRUE, folds = NULL,
verbose = T, print.every.n = 1L, early.stop.round = NULL,
maximize = NULL, ...)
Arguments
- params
- the list of parameters. Commonly used ones are:
objective
objective function, common ones arereg:linear
linear regressionbinary:logistic
logistic regression for classification
- data
- takes an
xgb.DMatrix
orMatrix
as the input. - nrounds
- the max number of iterations
- nfold
- the original dataset is randomly partitioned into
nfold
equal size subsamples. - label
- option field, when data is
Matrix
- missing
- Missing is only used when input is dense matrix, pick a float value that represents missing value. Sometime a data use 0 or other extreme value to represents missing values.
- prediction
- A logical value indicating whether to return the prediction vector.
- showsd
boolean
, whether show standard deviation of cross validation- metrics,
- list of evaluation metrics to be used in corss validation,
when it is not specified, the evaluation metric is chosen according to objective function.
Possible options are:
error
binary classification error ratermse
- obj
- customized objective function. Returns gradient and second order gradient with given prediction and dtrain.
- feval
- custimized evaluation function. Returns
list(metric='metric-name', value='metric-value')
with given prediction and dtrain. - stratified
boolean
whether sampling of folds should be stratified by the values of labels indata
- folds
list
provides a possibility of using a list of pre-defined CV folds (each element must be a vector of fold's indices). If folds are supplied, the nfold and stratified parameters would be ignored.- verbose
boolean
, print the statistics during the process- print.every.n
- Print every N progress messages when
verbose>0
. Default is 1 which means all messages are printed. - early.stop.round
- If
NULL
, the early stopping function is not triggered. If set to an integerk
, training with a validation set will stop if the performance keeps getting worse consecutively fork
rounds. - maximize
- If
feval
andearly.stop.round
are set, thenmaximize
must be set as well.maximize=TRUE
means the larger the evaluation score the better. - ...
- other parameters to pass to
params
.
Details
The original sample is randomly partitioned into nfold
equal size subsamples.
Of the nfold
subsamples, a single subsample is retained as the validation data for testing the model, and the remaining nfold - 1
subsamples are used as training data.
The cross-validation process is then repeated nrounds
times, with each of the nfold
subsamples used exactly once as the validation data.
All observations are used for both training and validation.
Adapted from
Value
- If
prediction = TRUE
, a list with the following elements is returned:dt
adata.table
with each mean and standard deviation stat for training set and test setpred
an array or matrix (for multiclass classification) with predictions for each CV-fold for the model having been trained on the data in all other folds.
If
prediction = FALSE
, just adata.table
with each mean and standard deviation stat for training set and test set is returned.
Examples
data(agaricus.train, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
history <- xgb.cv(data = dtrain, nround=3, nthread = 2, nfold = 5, metrics=list("rmse","auc"),
max.depth =3, eta = 1, objective = "binary:logistic")
print(history)
Community examples
Looks like there are no examples yet.