# predict.xgb.Booster

##### Predict method for eXtreme Gradient Boosting model

Predicted values based on either xgboost model or model handle object.

##### Usage

```
# S3 method for xgb.Booster
predict(object, newdata, missing = NA,
outputmargin = FALSE, ntreelimit = NULL, predleaf = FALSE,
reshape = FALSE, ...)
```# S3 method for xgb.Booster.handle
predict(object, ...)

##### Arguments

- object
Object of class

`xgb.Booster`

or`xgb.Booster.handle`

- newdata
takes

`matrix`

,`dgCMatrix`

, local data file or`xgb.DMatrix`

.- missing
Missing is only used when input is dense matrix. Pick a float value that represents missing values in data (e.g., sometimes 0 or some other extreme value is used).

- outputmargin
whether the prediction should be returned in the for of original untransformed sum of predictions from boosting iterations' results. E.g., setting

`outputmargin=TRUE`

for logistic regression would result in predictions for log-odds instead of probabilities.- ntreelimit
limit the number of model's trees or boosting iterations used in prediction (see Details). It will use all the trees by default (

`NULL`

value).- predleaf
whether predict leaf index instead.

- reshape
whether to reshape the vector of predictions to a matrix form when there are several prediction outputs per case. This option has no effect when

`predleaf = TRUE`

.- ...
Parameters passed to

`predict.xgb.Booster`

##### Details

Note that `ntreelimit`

is not necessarily equal to the number of boosting iterations
and it is not necessarily equal to the number of trees in a model.
E.g., in a random forest-like model, `ntreelimit`

would limit the number of trees.
But for multiclass classification, there are multiple trees per iteration,
but `ntreelimit`

limits the number of boosting iterations.

Also note that `ntreelimit`

would currently do nothing for predictions from gblinear,
since gblinear doesn't keep its boosting history.

One possible practical applications of the `predleaf`

option is to use the model
as a generator of new features which capture non-linearity and interactions,
e.g., as implemented in `xgb.create.features`

.

##### Value

For regression or binary classification, it returns a vector of length `nrows(newdata)`

.
For multiclass classification, either a `num_class * nrows(newdata)`

vector or
a `(nrows(newdata), num_class)`

dimension matrix is returned, depending on
the `reshape`

value.

When `predleaf = TRUE`

, the output is a matrix object with the
number of columns corresponding to the number of trees.

##### See Also

##### Examples

```
# NOT RUN {
## binary classification:
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test
bst <- xgboost(data = train$data, label = train$label, max_depth = 2,
eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
# use all trees by default
pred <- predict(bst, test$data)
# use only the 1st tree
pred <- predict(bst, test$data, ntreelimit = 1)
## multiclass classification in iris dataset:
lb <- as.numeric(iris$Species) - 1
num_class <- 3
set.seed(11)
bst <- xgboost(data = as.matrix(iris[, -5]), label = lb,
max_depth = 4, eta = 0.5, nthread = 2, nrounds = 10, subsample = 0.5,
objective = "multi:softprob", num_class = num_class)
# predict for softmax returns num_class probability numbers per case:
pred <- predict(bst, as.matrix(iris[, -5]))
str(pred)
# reshape it to a num_class-columns matrix
pred <- matrix(pred, ncol=num_class, byrow=TRUE)
# convert the probabilities to softmax labels
pred_labels <- max.col(pred) - 1
# the following should result in the same error as seen in the last iteration
sum(pred_labels != lb)/length(lb)
# compare that to the predictions from softmax:
set.seed(11)
bst <- xgboost(data = as.matrix(iris[, -5]), label = lb,
max_depth = 4, eta = 0.5, nthread = 2, nrounds = 10, subsample = 0.5,
objective = "multi:softmax", num_class = num_class)
pred <- predict(bst, as.matrix(iris[, -5]))
str(pred)
all.equal(pred, pred_labels)
# prediction from using only 5 iterations should result
# in the same error as seen in iteration 5:
pred5 <- predict(bst, as.matrix(iris[, -5]), ntreelimit=5)
sum(pred5 != lb)/length(lb)
## random forest-like model of 25 trees for binary classification:
set.seed(11)
bst <- xgboost(data = train$data, label = train$label, max_depth = 5,
nthread = 2, nrounds = 1, objective = "binary:logistic",
num_parallel_tree = 25, subsample = 0.6, colsample_bytree = 0.1)
# Inspect the prediction error vs number of trees:
lb <- test$label
dtest <- xgb.DMatrix(test$data, label=lb)
err <- sapply(1:25, function(n) {
pred <- predict(bst, dtest, ntreelimit=n)
sum((pred > 0.5) != lb)/length(lb)
})
plot(err, type='l', ylim=c(0,0.1), xlab='#trees')
# }
```

*Documentation reproduced from package xgboost, version 0.6-4, License: Apache License (== 2.0) | file LICENSE*