mboost: Model-based Gradient Boosting

Description

Gradient boosting for optimizing arbitrary loss functions, where component-wise models are utilized as base-learners.

Usage

mboost(formula, data = list(),
       baselearner = c("bbs", "bols", "btree", "bss", "bns"), ...)
mboost_fit(blg, response, weights = rep(1, NROW(response)), offset = NULL,
           family = Gaussian(), control = boost_control(), oobweights =
           as.numeric(weights == 0))

Arguments

formula

a symbolic description of the model to be fit.

data

a data frame containing the variables in the model.

baselearner

a character specifying the component-wise base learner to be used: bbs means P-splines with a B-spline basis (see Schmid and Hothorn 2008), bols linear

blg

a list of objects of class blg, as returned by all base-learners.

response

the response variable.

weights

a numeric vector of weights (optional).

offset

a numeric vector to be used as offset (optional).

family

a Family object.

control

a list of parameters controlling the algorithm. For more details see boost_control.

oobweights

an additional vector of out-of-bag weights (used internally by cvrisk).

...

additional arguments passed to mboost_fit, including weights, offset, family and control.

Value

An object of class mboost with print, AIC, plot and predict methods being available.

Details

The function implements component-wise functional gradient boosting in a generic way. Basically, the algorithm is initialized with a function for computing the negative gradient of the loss function (via its family argument) and one or more base-learners (given as blg). Usually blg and response are computed in the functions gamboost, glmboost, blackboost or mboost.

The algorithm minimized the in-sample empirical risk defined as the weighted sum (by weights) of the loss function (corresponding to the negative gradient) evaluated at the data.

The structure of the model is determined by the structure of the base-learners. If more than one base-learner is given, the model is additive in these components.

Base-learners can be specified via a formula interface (function mboost) or as a list of objects of class bl, see bols. oobweights is a vector used internally by cvrisk. When carrying out cross-validation to determine the optimal stopping iteration of a boosting model, the default value of oobweights (out-of-bag weights) assures that the cross-validated risk is computed using the same observation weights as those used for fitting the boosting model. It is strongly recommended to leave this argument unspecified.

Note that the more convenient modelling interfaces gamboost, glmboost and blackboost all call mboost directly.

References

Peter Buehlmann and Bin Yu (2003), Boosting with the L2 loss: regression and classification. Journal of the American Statistical Association, 98, 324--339.

Peter Buehlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477--505.

Yoav Freund and Robert E. Schapire (1996), Experiments with a new boosting algorithm. In Machine Learning: Proc. Thirteenth International Conference, 148--156.

Jerome H. Friedman (2001), Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 1189--1232.

Examples

Run this code

data("bodyfat", package = "mboost")

  ### formula interface: additive Gaussian model with
  ### a non-linear step-function in `age', a linear function in `waistcirc'
  ### and a smooth non-linear smooth function in `hipcirc'
  mod <- mboost(DEXfat ~ btree(age) + bols(waistcirc) + bbs(hipcirc),
                data = bodyfat)
  layout(matrix(1:6, nc = 3, byrow = TRUE))
  plot(mod, ask = FALSE, main = "formula")

  ### the same
  with(bodyfat,
       mod <- mboost_fit(list(btree(age), bols(waistcirc), bbs(hipcirc)),
                         response = DEXfat))
  plot(mod, ask = FALSE, main = "base-learner")

Run the code above in your browser using DataLab