glmboost: Gradient Boosting with Component-wise Linear Models

Description

Gradient boosting for optimizing arbitrary loss functions where component-wise linear models are utilized as base learners.

Usage

## S3 method for class 'formula':
glmboost(formula, data = list(), weights = NULL, 
             contrasts.arg = NULL, ...)
## S3 method for class 'matrix':
glmboost(x, y, weights = NULL, ...)
glmboost_fit(object, family = GaussReg(), 
             control = boost_control(), weights = NULL)
## S3 method for class 'glmboost':
plot(x, main = deparse(x$call), col = NULL, ...)

Arguments

formula

a symbolic description of the model to be fit.

data

a data frame containing the variables in the model.

weights

an optional vector of weights to be used in the fitting process.

contrasts.arg

a list, whose entries are contrasts suitable for input to the contrasts replacement function and whose names are the names of columns of data containing factors. See

design matrix or an object of class glmboost for plotting.

vector of responses.

object

an object of class boost_data, see boost_dpp.

family

an object of class boost_family-class, implementing the negative gradient corresponding to the loss function to be optimized, by default, squared error loss

control

an object of class boost_control.

main

a title for the plot.

col

(a vector of) colors for plotting the lines representing the coefficient paths.

...

additional arguments passed to callies.

Value

An object of class glmboost with print, coef, AIC and predict methods being available. For inputs with longer variable names, you might want to change par("mai") before calling the plot method of glmboost objects visualizing the coefficients path.

Details

A (generalized) linear model is fitted using a boosting algorithm based on component-wise univariate linear models. The fit, i.e., the regression coefficients, can be interpreted in the usual way. The methodology is described in Buhlmann and Yu (2003) and Buhlmann (2006).

The function glmboost_fit provides access to the fitting procedure without data pre-processing, e.g. for cross-validation.

References

Peter Buhlmann and Bin Yu (2003), Boosting with the L2 loss: regression and classification. Journal of the American Statistical Association, 98, 324--339.

Peter Buhlmann (2006), Boosting for high-dimensional linear models. The Annals of Statistics, 34.

Peter Buhlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, accepted. ftp://ftp.stat.math.ethz.ch/Research-Reports/Other-Manuscripts/buhlmann/BuehlmannHothorn_Boosting-rev.pdf

Examples

Run this code

### a simple two-dimensional example: cars data
    cars.gb <- glmboost(dist ~ speed, data = cars, 
                        control = boost_control(mstop = 5000))
    cars.gb

    ### coefficients should coincide
    coef(cars.gb) + c(cars.gb$offset, 0)
    coef(lm(dist ~ speed, data = cars))

    ### plot fit
    layout(matrix(1:2, ncol = 2))
    plot(dist ~ speed, data = cars)
    lines(cars$speed, predict(cars.gb), col = "red")

    ### alternative loss function: absolute loss
    cars.gbl <- glmboost(dist ~ speed, data = cars, 
                         control = boost_control(mstop = 5000), 
                         family = Laplace())
    cars.gbl

    coef(cars.gbl) + c(cars.gbl$offset, 0)
    lines(cars$speed, predict(cars.gbl), col = "green")

    ### Huber loss with adaptive choice of delta
    cars.gbh <- glmboost(dist ~ speed, data = cars, 
                         control = boost_control(mstop = 5000), 
                         family = Huber())

    lines(cars$speed, predict(cars.gbh), col = "blue")
    legend("topleft", col = c("red", "green", "blue"), lty = 1,
           legend = c("Gaussian", "Laplace", "Huber"), bty = "n")

    ### plot coefficient path of glmboost
    par(mai = par("mai") * c(1, 1, 1, 2.5))
    plot(cars.gb)

Run the code above in your browser using DataLab