xgb.cv
From xgboost v0.44
by Tong He
Cross Validation
The cross valudation function of xgboost
Usage
xgb.cv(params = list(), data, nrounds, nfold, label = NULL, missing = NULL, prediction = FALSE, showsd = TRUE, metrics = list(), obj = NULL, feval = NULL, stratified = TRUE, folds = NULL, verbose = T, print.every.n = 1L, early.stop.round = NULL, maximize = NULL, ...)
Arguments
 params
 the list of parameters. Commonly used ones are:

objective
objective function, common ones are
reg:linear
linear regression 
binary:logistic
logistic regression for classification


eta
step size of each boosting step 
max.depth
maximum depth of the tree 
nthread
number of thread used in training, if not set, all threads are used
See xgb.train for further details. See also demo/ for walkthrough example in R.

 data
 takes an
xgb.DMatrix
orMatrix
as the input.  nrounds
 the max number of iterations
 nfold
 the original dataset is randomly partitioned into
nfold
equal size subsamples.  label
 option field, when data is
Matrix
 missing
 Missing is only used when input is dense matrix, pick a float value that represents missing value. Sometime a data use 0 or other extreme value to represents missing values.
 prediction
 A logical value indicating whether to return the prediction vector.
 showsd
boolean
, whether show standard deviation of cross validation metrics,
 list of evaluation metrics to be used in corss validation,
when it is not specified, the evaluation metric is chosen according to objective function.
Possible options are:

error
binary classification error rate 
rmse
Rooted mean square error 
logloss
negative loglikelihood function 
auc
Area under curve 
merror
Exact matching error, used to evaluate multiclass classification

 obj
 customized objective function. Returns gradient and second order gradient with given prediction and dtrain.
 feval
 custimized evaluation function. Returns
list(metric='metricname', value='metricvalue')
with given prediction and dtrain.  stratified
boolean
whether sampling of folds should be stratified by the values of labels indata
 folds
list
provides a possibility of using a list of predefined CV folds (each element must be a vector of fold's indices). If folds are supplied, the nfold and stratified parameters would be ignored. verbose
boolean
, print the statistics during the process print.every.n
 Print every N progress messages when
verbose>0
. Default is 1 which means all messages are printed.  early.stop.round
 If
NULL
, the early stopping function is not triggered. If set to an integerk
, training with a validation set will stop if the performance keeps getting worse consecutively fork
rounds.  maximize
 If
feval
andearly.stop.round
are set, thenmaximize
must be set as well.maximize=TRUE
means the larger the evaluation score the better.  ...
 other parameters to pass to
params
.
Details
The original sample is randomly partitioned into nfold
equal size subsamples.
Of the nfold
subsamples, a single subsample is retained as the validation data for testing the model, and the remaining nfold  1
subsamples are used as training data.
The crossvalidation process is then repeated nrounds
times, with each of the nfold
subsamples used exactly once as the validation data.
All observations are used for both training and validation.
Adapted from http://en.wikipedia.org/wiki/Crossvalidation_%28statistics%29#kfold_crossvalidation
Value

If

dt
adata.table
with each mean and standard deviation stat for training set and test set 
pred
an array or matrix (for multiclass classification) with predictions for each CVfold for the model having been trained on the data in all other folds.
prediction = TRUE
, a list with the following elements is returned:
prediction = FALSE
, just a data.table
with each mean and standard deviation stat for training set and test set is returned.
Examples
data(agaricus.train, package='xgboost')
dtrain < xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
history < xgb.cv(data = dtrain, nround=3, nthread = 2, nfold = 5, metrics=list("rmse","auc"),
max.depth =3, eta = 1, objective = "binary:logistic")
print(history)
Community examples
Looks like there are no examples yet.