eXtreme Gradient Boosting Training

An advanced interface for training xgboost model. Look at xgboost function for a simpler interface.

xgb.train(params = list(), data, nrounds, watchlist = list(), obj = NULL,
  feval = NULL, verbose = 1, print.every.n = 1L,
  early.stop.round = NULL, maximize = NULL, ...)
the list of parameters.

1. General Parameters

  • boosterwhich booster to use, can begbtreeorgblinear. Default:gbtree
  • silent0 means printing running messages, 1 means silent mo

takes an xgb.DMatrix as the input.
the max number of iterations
what information should be printed when verbose=1 or verbose=2. Watchlist is used to specify validation set monitoring during training. For example user can specify watchlist=list(validation1=mat1, validation2=mat2) to wat
customized objective function. Returns gradient and second order gradient with given prediction and dtrain,
custimized evaluation function. Returns list(metric='metric-name', value='metric-value') with given prediction and dtrain,
If 0, xgboost will stay silent. If 1, xgboost will print information of performance. If 2, xgboost will print information of both
Print every N progress messages when verbose>0. Default is 1 which means all messages are printed.
If NULL, the early stopping function is not triggered. If set to an integer k, training with a validation set will stop if the performance keeps getting worse consecutively for k rounds.
If feval and early.stop.round are set, then maximize must be set as well. maximize=TRUE means the larger the evaluation score the better.
other parameters to pass to params.

This is the training function for xgboost.

It supports advanced features such as watchlist, customized objective function (feval), therefore it is more flexible than xgboost function.

Parallelization is automatically enabled if OpenMP is present. Number of threads can also be manually specified via nthread parameter.

eval_metric parameter (not listed above) is set automatically by Xgboost but can be overriden by parameter. Below is provided the list of different metric optimized by Xgboost to help you to understand how it works inside or to use them with the watchlist parameter.

  • rmseroot mean square error.http://en.wikipedia.org/wiki/Root_mean_square_error
  • loglossnegative log-likelihood.http://en.wikipedia.org/wiki/Log-likelihood
  • errorBinary classification error rate. It is calculated as(wrong cases) / (all cases). For the predictions, the evaluation will regard the instances with prediction value larger than 0.5 as positive instances, and the others as negative instances.
  • merrorMulticlass classification error rate. It is calculated as(wrong cases) / (all cases).
  • aucArea under the curve.http://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curvefor ranking evaluation.
  • ndcgNormalized Discounted Cumulative Gain (for ranking task).http://en.wikipedia.org/wiki/NDCG

Full list of parameters is available in the Wiki https://github.com/dmlc/xgboost/wiki/Parameters.

This function only accepts an xgb.DMatrix object as the input.

  • xgb.train
data(agaricus.train, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- dtrain
watchlist <- list(eval = dtest, train = dtrain)
logregobj <- function(preds, dtrain) {
   labels <- getinfo(dtrain, "label")
   preds <- 1/(1 + exp(-preds))
   grad <- preds - labels
   hess <- preds * (1 - preds)
   return(list(grad = grad, hess = hess))
evalerror <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  err <- as.numeric(sum(labels != (preds > 0)))/length(labels)
  return(list(metric = "error", value = err))
param <- list(max.depth = 2, eta = 1, silent = 1, objective=logregobj,eval_metric=evalerror)
bst <- xgb.train(param, dtrain, nthread = 2, nround = 2, watchlist)
Documentation reproduced from package xgboost, version 0.4-2, License: Apache License (== 2.0) | file LICENSE

Community examples

Looks like there are no examples yet.