xgboost: eXtreme Gradient Boosting (Tree) library

Description

A simple interface for training xgboost model. Look at xgb.train function for a more advanced interface.

Usage

xgboost(data = NULL, label = NULL, missing = NULL, params = list(), nrounds, verbose = 1, print.every.n = 1L, early.stop.round = NULL, maximize = NULL, ...)

Arguments

data

takes matrix, dgCMatrix, local data file or xgb.DMatrix.

label

the response variable. User should not set this field, if data is local data file or xgb.DMatrix.

missing

Missing is only used when input is dense matrix, pick a float value that represents missing value. Sometimes a data use 0 or other extreme value to represents missing values.

params

the list of parameters.

Commonly used ones are:

objective objective function, common ones are
- reg:linear linear regression
- binary:logistic logistic regression for classification

eta step size of each boosting step

max.depth maximum depth of the tree

nthread number of thread used in training, if not set, all threads are used

Look at xgb.train for a more complete list of parameters or https://github.com/dmlc/xgboost/wiki/Parameters for the full list.

See also demo/ for walkthrough example in R.

nrounds

the max number of iterations

verbose

If 0, xgboost will stay silent. If 1, xgboost will print information of performance. If 2, xgboost will print information of both performance and construction progress information

print.every.n

Print every N progress messages when verbose>0. Default is 1 which means all messages are printed.

early.stop.round

If NULL, the early stopping function is not triggered. If set to an integer k, training with a validation set will stop if the performance keeps getting worse consecutively for k rounds.

maximize

If feval and early.stop.round are set, then maximize must be set as well. maximize=TRUE means the larger the evaluation score the better.

...

other parameters to pass to params.

Details

This is the modeling function for Xgboost.

Parallelization is automatically enabled if OpenMP is present.

Number of threads can also be manually specified via nthread parameter.

Examples

Run this code

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test
bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
               eta = 1, nthread = 2, nround = 2, objective = "binary:logistic")
pred <- predict(bst, test$data)

Run the code above in your browser using DataLab