Gradient boosting for optimizing arbitrary loss functions where regression trees are utilized as base-learners.
blackboost(formula, data = list(),
weights = NULL, na.action = na.pass,
offset = NULL, family = Gaussian(),
control = boost_control(),
oobweights = NULL,
tree_controls = partykit::ctree_control(
teststat = "quad",
testtype = "Teststatistic",
mincriterion = 0,
minsplit = 10,
minbucket = 4,
maxdepth = 2,
saveinfo = FALSE),
...)
An object of class mboost
with print
and predict
methods being available.
a symbolic description of the model to be fit.
a data frame containing the variables in the model.
an optional vector of weights to be used in the fitting process.
a function which indicates what should happen when the data
contain NA
s.
a numeric vector to be used as offset (optional).
a Family
object.
a list of parameters controlling the algorithm. For
more details see boost_control
.
an additional vector of out-of-bag weights, which is
used for the out-of-bag risk (i.e., if boost_control(risk =
"oobag")
). This argument is also used internally by
cvrisk
.
an object of class "TreeControl"
, which
can be obtained using ctree_control
. Defines
hyper-parameters for the trees which are used as base-learners. It
is wise to make sure to understand the consequences of altering any
of its arguments. By default, two-way interactions (but not deeper
trees) are fitted.
additional arguments passed to mboost_fit
,
including weights
, offset
, family
and
control
. For default values see mboost_fit
.
This function implements the `classical'
gradient boosting utilizing regression trees as base-learners.
Essentially, the same algorithm is implemented in package
gbm
. The
main difference is that arbitrary loss functions to be optimized
can be specified via the family
argument to blackboost
whereas
gbm
uses hard-coded loss functions.
Moreover, the base-learners (conditional
inference trees, see ctree
) are a little bit more flexible.
The regression fit is a black box prediction machine and thus hardly interpretable.
Partial dependency plots are not yet available; see example section for plotting of additive tree models.
Peter Buehlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477--505.
Torsten Hothorn, Kurt Hornik and Achim Zeileis (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 651--674.
Yoav Freund and Robert E. Schapire (1996), Experiments with a new boosting algorithm. In Machine Learning: Proc. Thirteenth International Conference, 148--156.
Jerome H. Friedman (2001), Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 1189--1232.
Greg Ridgeway (1999), The state of boosting. Computing Science and Statistics, 31, 172--181.
See mboost_fit
for the generic boosting function,
glmboost
for boosted linear models, and
gamboost
for boosted additive models.
See baselearners
for possible base-learners.
See cvrisk
for cross-validated stopping iteration.
Furthermore see boost_control
, Family
and
methods
.
### a simple two-dimensional example: cars data
cars.gb <- blackboost(dist ~ speed, data = cars,
control = boost_control(mstop = 50))
cars.gb
### plot fit
plot(dist ~ speed, data = cars)
lines(cars$speed, predict(cars.gb), col = "red")
### set up and plot additive tree model
if (require("partykit")) {
ctrl <- ctree_control(maxdepth = 3)
viris <- subset(iris, Species != "setosa")
viris$Species <- viris$Species[, drop = TRUE]
imod <- mboost(Species ~ btree(Sepal.Length, tree_controls = ctrl) +
btree(Sepal.Width, tree_controls = ctrl) +
btree(Petal.Length, tree_controls = ctrl) +
btree(Petal.Width, tree_controls = ctrl),
data = viris, family = Binomial())[500]
layout(matrix(1:4, ncol = 2))
plot(imod)
}
Run the code above in your browser using DataLab