xgboost (version 0.3-3)

xgb.cv: Cross Validation

Description

The cross valudation function of xgboost

Usage

xgb.cv(params = list(), data, nrounds, nfold, label = NULL,
  missing = NULL, prediction = FALSE, showsd = TRUE, metrics = list(),
  obj = NULL, feval = NULL, verbose = T, ...)

Arguments

params
the list of parameters. Commonly used ones are:
  • objectiveobjective function, common ones are
    • reg:linearlinear regression
    • binary:logisticlogistic regression for classification
data
takes an xgb.DMatrix as the input.
nrounds
the max number of iterations
nfold
number of folds used
label
option field, when data is Matrix
missing
Missing is only used when input is dense matrix, pick a float value that represents missing value. Sometime a data use 0 or other extreme value to represents missing values.
prediction
A logical value indicating whether to return the prediction vector.
showsd
boolean, whether show standard deviation of cross validation
metrics,
list of evaluation metrics to be used in corss validation, when it is not specified, the evaluation metric is chosen according to objective function. Possible options are:
  • errorbinary classification error rate
  • rmse
obj
customized objective function. Returns gradient and second order gradient with given prediction and dtrain,
feval
custimized evaluation function. Returns list(metric='metric-name', value='metric-value') with given prediction and dtrain,
verbose
boolean, print the statistics during the process.
...
other parameters to pass to params.

Value

  • A data.table with each mean and standard deviation stat for training set and test set.

Details

This is the cross validation function for xgboost

Parallelization is automatically enabled if OpenMP is present. Number of threads can also be manually specified via "nthread" parameter.

This function only accepts an xgb.DMatrix object as the input.

Examples

Run this code
data(agaricus.train, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
history <- xgb.cv(data = dtrain, nround=3, nthread = 2, nfold = 5, metrics=list("rmse","auc"),
                  "max.depth"=3, "eta"=1, "objective"="binary:logistic")
print(history)

Run the code above in your browser using DataLab