xgboost (version 0.4-2)

xgb.train: eXtreme Gradient Boosting Training

Description

An advanced interface for training xgboost model. Look at xgboost function for a simpler interface.

Usage

xgb.train(params = list(), data, nrounds, watchlist = list(), obj = NULL,
  feval = NULL, verbose = 1, print.every.n = 1L,
  early.stop.round = NULL, maximize = NULL, ...)

Arguments

params
the list of parameters.

1. General Parameters

  • boosterwhich booster to use, can begbtreeorgblinear. Default:gbtree
  • silent0 means printing running messages, 1 means silent mo

data
takes an xgb.DMatrix as the input.
nrounds
the max number of iterations
watchlist
what information should be printed when verbose=1 or verbose=2. Watchlist is used to specify validation set monitoring during training. For example user can specify watchlist=list(validation1=mat1, validation2=mat2) to wat
obj
customized objective function. Returns gradient and second order gradient with given prediction and dtrain,
feval
custimized evaluation function. Returns list(metric='metric-name', value='metric-value') with given prediction and dtrain,
verbose
If 0, xgboost will stay silent. If 1, xgboost will print information of performance. If 2, xgboost will print information of both
print.every.n
Print every N progress messages when verbose>0. Default is 1 which means all messages are printed.
early.stop.round
If NULL, the early stopping function is not triggered. If set to an integer k, training with a validation set will stop if the performance keeps getting worse consecutively for k rounds.
maximize
If feval and early.stop.round are set, then maximize must be set as well. maximize=TRUE means the larger the evaluation score the better.
...
other parameters to pass to params.

Details

This is the training function for xgboost.

It supports advanced features such as watchlist, customized objective function (feval), therefore it is more flexible than xgboost function.

Parallelization is automatically enabled if OpenMP is present. Number of threads can also be manually specified via nthread parameter.

eval_metric parameter (not listed above) is set automatically by Xgboost but can be overriden by parameter. Below is provided the list of different metric optimized by Xgboost to help you to understand how it works inside or to use them with the watchlist parameter.

  • rmseroot mean square error.http://en.wikipedia.org/wiki/Root_mean_square_error
  • loglossnegative log-likelihood.http://en.wikipedia.org/wiki/Log-likelihood
  • errorBinary classification error rate. It is calculated as(wrong cases) / (all cases). For the predictions, the evaluation will regard the instances with prediction value larger than 0.5 as positive instances, and the others as negative instances.
  • merrorMulticlass classification error rate. It is calculated as(wrong cases) / (all cases).
  • aucArea under the curve.http://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curvefor ranking evaluation.
  • ndcgNormalized Discounted Cumulative Gain (for ranking task).http://en.wikipedia.org/wiki/NDCG

Full list of parameters is available in the Wiki https://github.com/dmlc/xgboost/wiki/Parameters.

This function only accepts an xgb.DMatrix object as the input.

Examples

Run this code
data(agaricus.train, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- dtrain
watchlist <- list(eval = dtest, train = dtrain)
logregobj <- function(preds, dtrain) {
   labels <- getinfo(dtrain, "label")
   preds <- 1/(1 + exp(-preds))
   grad <- preds - labels
   hess <- preds * (1 - preds)
   return(list(grad = grad, hess = hess))
}
evalerror <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  err <- as.numeric(sum(labels != (preds > 0)))/length(labels)
  return(list(metric = "error", value = err))
}
param <- list(max.depth = 2, eta = 1, silent = 1, objective=logregobj,eval_metric=evalerror)
bst <- xgb.train(param, dtrain, nthread = 2, nround = 2, watchlist)

Run the code above in your browser using DataCamp Workspace