mboost: Gradient Boosting for Additive Models

Description

Gradient boosting for optimizing arbitrary loss functions, where component-wise arbitrary base-learners, e.g., smoothing procedures, are utilized as additive base-learners.

Usage

mboost(formula, data = list(), na.action = na.omit, weights = NULL, 
       offset = NULL, family = Gaussian(), control = boost_control(),
       oobweights = NULL, baselearner = c("bbs", "bols", "btree", "bss", "bns"), 
       ...)
gamboost(formula, data = list(), na.action = na.omit, weights = NULL, 
         offset = NULL, family = Gaussian(), control = boost_control(),
         oobweights = NULL, baselearner = c("bbs", "bols", "btree", "bss", "bns"),
         dfbase = 4, ...)

Arguments

formula

a symbolic description of the model to be fit.

data

a data frame containing the variables in the model.

na.action

a function which indicates what should happen when the data contain NAs.

weights

(optional) a numeric vector of weights to be used in the fitting process.

offset

a numeric vector to be used as offset (optional).

family

a Family object.

control

a list of parameters controlling the algorithm. For more details see boost_control.

oobweights

an additional vector of out-of-bag weights, which is used for the out-of-bag risk (i.e., if boost_control(risk = "oobag")). This argument is also used internally by cvrisk.

baselearner

a character specifying the component-wise base learner to be used: bbs means P-splines with a B-spline basis (see Schmid and Hothorn 2008), bols linear models and btree boosts stumps. bss and bns are deprecated. Component-wise smoothing splines have been considered in Buehlmann and Yu (2003) and Schmid and Hothorn (2008) investigate P-splines with a B-spline basis. Kneib, Hothorn and Tutz (2009) also utilize P-splines with a B-spline basis, supplement them with their bivariate tensor product version to estimate interaction surfaces and spatial effects and also consider random effects base learners.

dfbase

a single integer giving the degrees of freedom for P-spline base-learners (bbs) globally.

…

additional arguments passed to mboost_fit; currently none.

Value

An object of class mboost with print, AIC, plot and predict methods being available.

Details

A (generalized) additive model is fitted using a boosting algorithm based on component-wise base-learners.

The base-learners can either be specified via the formula object or via the baselearner argument. The latter argument is the default base-learner which is used for all variables in the formula, whithout explicit base-learner specification (i.e., if the base-learners are explicitly specified in formula, the baselearner argument will be ignored for this variable).

Of note, "bss" and "bns" are deprecated and only in the list for backward compatibility.

Note that more base-learners (i.e., in addition to the ones provided via baselearner) can be specified in formula. See baselearners for details.

The only difference when calling mboost and gamboost is that the latter function allows one to specify default degrees of freedom for smooth effects specified via baselearner = "bbs". In all other cases, degrees of freedom need to be set manually via a specific definition of the corresponding base-learner.

References

Peter Buehlmann and Bin Yu (2003), Boosting with the L2 loss: regression and classification. Journal of the American Statistical Association, 98, 324--339.

Peter Buehlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477--505.

Thomas Kneib, Torsten Hothorn and Gerhard Tutz (2009), Variable selection and model choice in geoadditive regression models, Biometrics, 65(2), 626--634.

Matthias Schmid and Torsten Hothorn (2008), Boosting additive models using component-wise P-splines as base-learners. Computational Statistics \& Data Analysis, 53(2), 298--311.

Torsten Hothorn, Peter Buehlmann, Thomas Kneib, Mattthias Schmid and Benjamin Hofner (2010), Model-based Boosting 2.0. Journal of Machine Learning Research, 11, 2109 -- 2113.

Benjamin Hofner, Andreas Mayr, Nikolay Robinzonov and Matthias Schmid (2014). Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost. Computational Statistics, 29, 3--35. http://dx.doi.org/10.1007/s00180-012-0382-5

Available as vignette via: vignette(package = "mboost", "mboost_tutorial")

Examples

Run this code

# NOT RUN {
    ### a simple two-dimensional example: cars data
    cars.gb <- gamboost(dist ~ speed, data = cars, dfbase = 4,
                        control = boost_control(mstop = 50))
    cars.gb
    AIC(cars.gb, method = "corrected")

    ### plot fit for mstop = 1, ..., 50
    plot(dist ~ speed, data = cars)
    tmp <- sapply(1:mstop(AIC(cars.gb)), function(i)
        lines(cars$speed, predict(cars.gb[i]), col = "red"))
    lines(cars$speed, predict(smooth.spline(cars$speed, cars$dist),
                              cars$speed)$y, col = "green")

    ### artificial example: sinus transformation
    x <- sort(runif(100)) * 10
    y <- sin(x) + rnorm(length(x), sd = 0.25)
    plot(x, y)
    ### linear model
    lines(x, fitted(lm(y ~ sin(x) - 1)), col = "red")
    ### GAM
    lines(x, fitted(gamboost(y ~ x,
                    control = boost_control(mstop = 500))),
          col = "green")

# }

Run the code above in your browser using DataLab