##### Cross Validation

The cross valudation function of xgboost

##### Usage

```
xgb.cv(params = list(), data, nrounds, nfold, label = NULL,
missing = NULL, prediction = FALSE, showsd = TRUE, metrics = list(),
obj = NULL, feval = NULL, stratified = TRUE, folds = NULL,
verbose = T, print.every.n = 1L, early.stop.round = NULL,
maximize = NULL, ...)
```

##### Arguments

- params
- the list of parameters. Commonly used ones are:
`objective`

objective function, common ones are`reg:linear`

linear regression`binary:logistic`

logistic regression for classification

- data
- takes an
`xgb.DMatrix`

or`Matrix`

as the input. - nrounds
- the max number of iterations
- nfold
- the original dataset is randomly partitioned into
`nfold`

equal size subsamples. - label
- option field, when data is
`Matrix`

- missing
- Missing is only used when input is dense matrix, pick a float value that represents missing value. Sometime a data use 0 or other extreme value to represents missing values.
- prediction
- A logical value indicating whether to return the prediction vector.
- showsd
`boolean`

, whether show standard deviation of cross validation- metrics,
- list of evaluation metrics to be used in corss validation,
when it is not specified, the evaluation metric is chosen according to objective function.
Possible options are:
`error`

binary classification error rate`rmse`

- obj
- customized objective function. Returns gradient and second order gradient with given prediction and dtrain.
- feval
- custimized evaluation function. Returns
`list(metric='metric-name', value='metric-value')`

with given prediction and dtrain. - stratified
`boolean`

whether sampling of folds should be stratified by the values of labels in`data`

- folds
`list`

provides a possibility of using a list of pre-defined CV folds (each element must be a vector of fold's indices). If folds are supplied, the nfold and stratified parameters would be ignored.- verbose
`boolean`

, print the statistics during the process- print.every.n
- Print every N progress messages when
`verbose>0`

. Default is 1 which means all messages are printed. - early.stop.round
- If
`NULL`

, the early stopping function is not triggered. If set to an integer`k`

, training with a validation set will stop if the performance keeps getting worse consecutively for`k`

rounds. - maximize
- If
`feval`

and`early.stop.round`

are set, then`maximize`

must be set as well.`maximize=TRUE`

means the larger the evaluation score the better. - ...
- other parameters to pass to
`params`

.

##### Details

The original sample is randomly partitioned into `nfold`

equal size subsamples.

Of the `nfold`

subsamples, a single subsample is retained as the validation data for testing the model, and the remaining `nfold - 1`

subsamples are used as training data.

The cross-validation process is then repeated `nrounds`

times, with each of the `nfold`

subsamples used exactly once as the validation data.

All observations are used for both training and validation.

Adapted from http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29#k-fold_cross-validation

##### Value

- If
`prediction = TRUE`

, a list with the following elements is returned:
`dt`

a`data.table`

with each mean and standard deviation stat for training set and test set`pred`

an array or matrix (for multiclass classification) with predictions for each CV-fold for the model having been trained on the data in all other folds.

If `prediction = FALSE`

, just a `data.table`

with each mean and standard deviation stat for training set and test set is returned.

##### Examples

```
data(agaricus.train, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
history <- xgb.cv(data = dtrain, nround=3, nthread = 2, nfold = 5, metrics=list("rmse","auc"),
max.depth =3, eta = 1, objective = "binary:logistic")
print(history)
```

