glmboost: Gradient Boosting with Component-wise Linear Models

Description

Gradient boosting for optimizing arbitrary loss functions where component-wise linear models are utilized as base-learners.

Usage

## S3 method for class 'formula':
glmboost(formula, data = list(), weights = NULL,
          na.action = na.pass, contrasts.arg = NULL,
          center = TRUE, control = boost_control(), ...)
## S3 method for class 'matrix':
glmboost(x, y, center = TRUE, control = boost_control(), ...)
## S3 method for class 'default':
glmboost(x,  ...)
## S3 method for class 'glmboost':
plot(x, main = deparse(x$call), col = NULL,
                        off2int = FALSE, ...)

Arguments

formula

a symbolic description of the model to be fit.

data

a data frame containing the variables in the model.

weights

an optional vector of weights to be used in the fitting process.

contrasts.arg

a list, whose entries are contrasts suitable for input to the contrasts replacement function and whose names are the names of columns of data containing factors. See

na.action

a function which indicates what should happen when the data contain NAs.

center

logical indicating of the predictor variables are centered before fitting.

control

a list of parameters controlling the algorithm.

design matrix or an object of class glmboost for plotting. Sparse matrices of class Matrix can be used as well.

vector of responses.

main

a title for the plot.

col

(a vector of) colors for plotting the lines representing the coefficient paths.

off2int

logical indicating whether the offset should be added to the intercept (if there is any) or if the offset is neglected for plotting (default).

...

additional arguments passed to mboost_fit, including weights, offset, family and control. For default values see

Value

An object of class glmboost with print, coef, AIC and predict methods being available. For inputs with longer variable names, you might want to change par("mai") before calling the plot method of glmboost objects visualizing the coefficients path.

Details

A (generalized) linear model is fitted using a boosting algorithm based on component-wise univariate linear models. The fit, i.e., the regression coefficients, can be interpreted in the usual way. The methodology is described in Buehlmann and Yu (2003), Buehlmann (2006), and Buehlmann and Hothorn (2007).

References

Peter Buehlmann and Bin Yu (2003), Boosting with the L2 loss: regression and classification. Journal of the American Statistical Association, 98, 324--339.

Peter Buehlmann (2006), Boosting for high-dimensional linear models. The Annals of Statistics, 34(2), 559--583.

Peter Buehlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477--505.

Torsten Hothorn, Peter Buehlmann, Thomas Kneib, Mattthias Schmid and Benjamin Hofner (2010), Model-based Boosting 2.0. Journal of Machine Learning Research, 11, 2109--2113.

Benjamin Hofner, Andreas Mayr, Nikolay Robinzonov and Matthias Schmid (2012). Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost. Department of Statistics, Technical Report No. 120. http://epub.ub.uni-muenchen.de/12754/

Available as vignette via: vignette(package = "mboost", "mboost_tutorial")

Examples

Run this code

### a simple two-dimensional example: cars data
    cars.gb <- glmboost(dist ~ speed, data = cars,
                        control = boost_control(mstop = 2000),
                        center = FALSE)
    cars.gb

    ### coefficients should coincide
    cf <- coef(cars.gb, off2int = TRUE)     ## add offset to intercept
    coef(cars.gb) + c(cars.gb$offset, 0)    ## add offset to intercept (by hand)
    signif(cf, 3)
    signif(coef(lm(dist ~ speed, data = cars)), 3)
    ## almost converged. With higher mstop the results get even better
 
    ### now we center the design matrix for
    ### much quicker "convergence"
    cars.gb_centered <- glmboost(dist ~ speed, data = cars,
                                 control = boost_control(mstop = 2000),
                                 center = TRUE)

    ## plot coefficient paths oth glmboost
    par(mfrow=c(1,2), mai = par("mai") * c(1, 1, 1, 2.5))
    plot(cars.gb, main="without centering")
    plot(cars.gb_centered, main="with centering")

    ### alternative loss function: absolute loss
    cars.gbl <- glmboost(dist ~ speed, data = cars,
                         control = boost_control(mstop = 1000),
                         family = Laplace())
    cars.gbl
    coef(cars.gbl, off2int = TRUE)

    ### plot fit
    par(mfrow = c(1,1))
    plot(dist ~ speed, data = cars)
    lines(cars$speed, predict(cars.gb), col = "red")     ## quadratic loss
    lines(cars$speed, predict(cars.gbl), col = "green")  ## absolute loss

    ### Huber loss with adaptive choice of delta
    cars.gbh <- glmboost(dist ~ speed, data = cars,
                         control = boost_control(mstop = 1000),
                         family = Huber())

    lines(cars$speed, predict(cars.gbh), col = "blue")   ## Huber loss
    legend("topleft", col = c("red", "green", "blue"), lty = 1,
           legend = c("Gaussian", "Laplace", "Huber"), bty = "n")

    ### sparse high-dimensional example that makes use of the matrix
    ### interface of glmboost and uses the matrix representation from
    ### package Matrix
    library("Matrix")
    n <- 100
    p <- 10000
    ptrue <- 10
    X <- Matrix(0, nrow = n, ncol = p)
    X[sample(1:(n * p), floor(n * p / 20))] <- runif(floor(n * p / 20))
    beta <- numeric(p)
    beta[sample(1:p, ptrue)] <- 10
    y <- drop(X %*% beta + rnorm(n, sd = 0.1))
    mod <- glmboost(y = y, x = X, center = TRUE) ### mstop needs tuning
    coef(mod, which = which(beta > 0))

Run the code above in your browser using DataLab