xgb.train: eXtreme Gradient Boosting Training

Description

An advanced interface for training xgboost model. Look at xgboost function for a simpler interface.

Usage

xgb.train(params = list(), data, nrounds, watchlist = list(), obj = NULL,
  feval = NULL, verbose = 1, print.every.n = 1L,
  early.stop.round = NULL, maximize = NULL, ...)

Arguments

params

the list of parameters.

1. General Parameters

boosterwhich booster to use, can begbtreeorgblinear. Default:gbtree
silent0 means printing running messages, 1 means silent mo

data

takes an xgb.DMatrix as the input.

nrounds

the max number of iterations

watchlist

what information should be printed when verbose=1 or verbose=2. Watchlist is used to specify validation set monitoring during training. For example user can specify watchlist=list(validation1=mat1, validation2=mat2) to wat

obj

customized objective function. Returns gradient and second order gradient with given prediction and dtrain,

feval

custimized evaluation function. Returns list(metric='metric-name', value='metric-value') with given prediction and dtrain,

verbose

If 0, xgboost will stay silent. If 1, xgboost will print information of performance. If 2, xgboost will print information of both

print.every.n

Print every N progress messages when verbose>0. Default is 1 which means all messages are printed.

early.stop.round

If NULL, the early stopping function is not triggered. If set to an integer k, training with a validation set will stop if the performance keeps getting worse consecutively for k rounds.

maximize

If feval and early.stop.round are set, then maximize must be set as well. maximize=TRUE means the larger the evaluation score the better.

...

other parameters to pass to params.

Details

This is the training function for xgboost.

It supports advanced features such as watchlist, customized objective function (feval), therefore it is more flexible than xgboost function.

Parallelization is automatically enabled if OpenMP is present. Number of threads can also be manually specified via nthread parameter.

eval_metric parameter (not listed above) is set automatically by Xgboost but can be overriden by parameter. Below is provided the list of different metric optimized by Xgboost to help you to understand how it works inside or to use them with the watchlist parameter.

rmseroot mean square error.http://en.wikipedia.org/wiki/Root_mean_square_error
loglossnegative log-likelihood.http://en.wikipedia.org/wiki/Log-likelihood
errorBinary classification error rate. It is calculated as(wrong cases) / (all cases). For the predictions, the evaluation will regard the instances with prediction value larger than 0.5 as positive instances, and the others as negative instances.
merrorMulticlass classification error rate. It is calculated as(wrong cases) / (all cases).
aucArea under the curve.http://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curvefor ranking evaluation.
ndcgNormalized Discounted Cumulative Gain (for ranking task).http://en.wikipedia.org/wiki/NDCG

Full list of parameters is available in the Wiki https://github.com/dmlc/xgboost/wiki/Parameters.

This function only accepts an xgb.DMatrix object as the input.

Examples

Run this code

data(agaricus.train, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- dtrain
watchlist <- list(eval = dtest, train = dtrain)
logregobj <- function(preds, dtrain) {
   labels <- getinfo(dtrain, "label")
   preds <- 1/(1 + exp(-preds))
   grad <- preds - labels
   hess <- preds * (1 - preds)
   return(list(grad = grad, hess = hess))
}
evalerror <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  err <- as.numeric(sum(labels != (preds > 0)))/length(labels)
  return(list(metric = "error", value = err))
}
param <- list(max.depth = 2, eta = 1, silent = 1, objective=logregobj,eval_metric=evalerror)
bst <- xgb.train(param, dtrain, nthread = 2, nround = 2, watchlist)

Run the code above in your browser using DataLab