mboost_fit: Model-based Gradient Boosting

Description

Work-horse for gradient boosting for optimizing arbitrary loss functions, where component-wise models are utilized as base-learners. Usually, this function is not called directly by the user.

Usage

mboost_fit(blg, response, weights = rep(1, NROW(response)), offset = NULL,
           family = Gaussian(), control = boost_control(), oobweights =
           as.numeric(weights == 0))

Arguments

blg

a list of objects of elements of class blg, as returned by all base-learners.

response

the response variable.

weights

(optional) a numeric vector of weights to be used in the fitting process.

offset

a numeric vector to be used as offset (optional).

family

a Family object.

control

a list of parameters controlling the algorithm. For more details see boost_control.

oobweights

an additional vector of out-of-bag weights, which is used for the out-of-bag risk (i.e., if boost_control(risk = "oobag")). This argument is also used internally by cvrisk.

…

additional arguments passed to mboost_fit; currently none.

Value

An object of class mboost with print, AIC, plot and predict methods being available.

Details

The function implements component-wise functional gradient boosting in a generic way. This function is the main work horse and used as back-end by all boosting algorithms in a unified way. Usually, this function is not called directly. Note that the more convenient modelling interfaces gamboost, glmboost and blackboost all call mboost_fit.

Basically, the algorithm is initialized with a function for computing the negative gradient of the loss function (via its family argument) and one or more base-learners (given as blg). Usually blg and response are computed in the functions gamboost, glmboost, blackboost or mboost. See there for details on the specification of base-learners.

The algorithm minimized the in-sample empirical risk defined as the weighted sum (by weights) of the loss function (corresponding to the negative gradient) evaluated at the data.

The structure of the model is determined by the structure of the base-learners. If more than one base-learner is given, the model is additive in these components.

Base-learners can be specified via a formula interface (function mboost) or as a list of objects of class bl, see, e.g., bols.

oobweights is a vector used internally by cvrisk. When carrying out cross-validation to determine the optimal stopping iteration of a boosting model, the default value of oobweights (out-of-bag weights) assures that the cross-validated risk is computed using the same observation weights as those used for fitting the boosting model. It is strongly recommended to leave this argument unspecified.

References

Peter Buehlmann and Bin Yu (2003), Boosting with the L2 loss: regression and classification. Journal of the American Statistical Association, 98, 324--339.

Peter Buehlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477--505.

Torsten Hothorn, Peter Buehlmann, Thomas Kneib, Mattthias Schmid and Benjamin Hofner (2010), Model-based Boosting 2.0. Journal of Machine Learning Research, 11, 2109--2113.

Yoav Freund and Robert E. Schapire (1996), Experiments with a new boosting algorithm. In Machine Learning: Proc. Thirteenth International Conference, 148--156.

Jerome H. Friedman (2001), Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 1189--1232.

Benjamin Hofner, Andreas Mayr, Nikolay Robinzonov and Matthias Schmid (2014). Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost. Computational Statistics, 29, 3--35. http://dx.doi.org/10.1007/s00180-012-0382-5

Available as vignette via: vignette(package = "mboost", "mboost_tutorial")

Examples

Run this code

# NOT RUN {
  data("bodyfat", package = "TH.data")

  ### formula interface: additive Gaussian model with
  ### a non-linear step-function in `age', a linear function in `waistcirc'
  ### and a smooth non-linear smooth function in `hipcirc'
  mod <- mboost(DEXfat ~ btree(age) + bols(waistcirc) + bbs(hipcirc),
                data = bodyfat)
  layout(matrix(1:6, nc = 3, byrow = TRUE))
  plot(mod, main = "formula")

  ### the same
  with(bodyfat,
       mod <- mboost_fit(list(btree(age), bols(waistcirc), bbs(hipcirc)),
                         response = DEXfat))
  plot(mod, main = "base-learner")
# }

Run the code above in your browser using DataLab