xgboost (version 0.4-4)

xgboost: eXtreme Gradient Boosting (Tree) library


A simple interface for training xgboost model. Look at xgb.train function for a more advanced interface.


xgboost(data = NULL, label = NULL, missing = NULL, params = list(), nrounds, verbose = 1, print.every.n = 1L, early.stop.round = NULL, maximize = NULL, ...)


takes matrix, dgCMatrix, local data file or xgb.DMatrix.
the response variable. User should not set this field, if data is local data file or xgb.DMatrix.
Missing is only used when input is dense matrix, pick a float value that represents missing value. Sometimes a data use 0 or other extreme value to represents missing values.
the list of parameters.

Commonly used ones are:

  • objective objective function, common ones are
    • reg:linear linear regression
    • binary:logistic logistic regression for classification

  • eta step size of each boosting step
  • max.depth maximum depth of the tree
  • nthread number of thread used in training, if not set, all threads are used
  • Look at xgb.train for a more complete list of parameters or https://github.com/dmlc/xgboost/wiki/Parameters for the full list.

    See also demo/ for walkthrough example in R.

    the max number of iterations
    If 0, xgboost will stay silent. If 1, xgboost will print information of performance. If 2, xgboost will print information of both performance and construction progress information
    Print every N progress messages when verbose>0. Default is 1 which means all messages are printed.
    If NULL, the early stopping function is not triggered. If set to an integer k, training with a validation set will stop if the performance keeps getting worse consecutively for k rounds.
    If feval and early.stop.round are set, then maximize must be set as well. maximize=TRUE means the larger the evaluation score the better.
    other parameters to pass to params.


    This is the modeling function for Xgboost.

    Parallelization is automatically enabled if OpenMP is present.

    Number of threads can also be manually specified via nthread parameter.


    data(agaricus.train, package='xgboost')
    data(agaricus.test, package='xgboost')
    train <- agaricus.train
    test <- agaricus.test
    bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
                   eta = 1, nthread = 2, nround = 2, objective = "binary:logistic")
    pred <- predict(bst, test$data)