xgboost (version 0.4-4)

xgboost: eXtreme Gradient Boosting (Tree) library

Description

A simple interface for training xgboost model. Look at xgb.train function for a more advanced interface.

Usage

xgboost(data = NULL, label = NULL, missing = NULL, params = list(), nrounds, verbose = 1, print.every.n = 1L, early.stop.round = NULL, maximize = NULL, ...)

Arguments

data
takes matrix, dgCMatrix, local data file or xgb.DMatrix.
label
the response variable. User should not set this field, if data is local data file or xgb.DMatrix.
missing
Missing is only used when input is dense matrix, pick a float value that represents missing value. Sometimes a data use 0 or other extreme value to represents missing values.
params
the list of parameters.

Commonly used ones are:

  • objective objective function, common ones are
    • reg:linear linear regression
    • binary:logistic logistic regression for classification

  • eta step size of each boosting step
  • max.depth maximum depth of the tree
  • nthread number of thread used in training, if not set, all threads are used
  • Look at xgb.train for a more complete list of parameters or https://github.com/dmlc/xgboost/wiki/Parameters for the full list.

    See also demo/ for walkthrough example in R.

    nrounds
    the max number of iterations
    verbose
    If 0, xgboost will stay silent. If 1, xgboost will print information of performance. If 2, xgboost will print information of both performance and construction progress information
    print.every.n
    Print every N progress messages when verbose>0. Default is 1 which means all messages are printed.
    early.stop.round
    If NULL, the early stopping function is not triggered. If set to an integer k, training with a validation set will stop if the performance keeps getting worse consecutively for k rounds.
    maximize
    If feval and early.stop.round are set, then maximize must be set as well. maximize=TRUE means the larger the evaluation score the better.
    ...
    other parameters to pass to params.

    Details

    This is the modeling function for Xgboost.

    Parallelization is automatically enabled if OpenMP is present.

    Number of threads can also be manually specified via nthread parameter.

    Examples

    Run this code
    data(agaricus.train, package='xgboost')
    data(agaricus.test, package='xgboost')
    train <- agaricus.train
    test <- agaricus.test
    bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
                   eta = 1, nthread = 2, nround = 2, objective = "binary:logistic")
    pred <- predict(bst, test$data)
    

    Run the code above in your browser using DataLab