methods: Methods for Gradient Boosting Objects

Description

Methods for models fitted by boosting algorithms.

Usage

## S3 method for class 'glmboost':
print(x, ...)
## S3 method for class 'mboost':
print(x, ...)
## S3 method for class 'mboost':
summary(object, ...)
## S3 method for class 'glmboost':
coef(object, which = NULL,
    aggregate = c("sum", "cumsum", "none"), ...)
## S3 method for class 'gamboost':
coef(object, which = NULL,
    aggregate = c("sum", "cumsum", "none"), ...)
## S3 method for class 'mboost':
[(x, i, return = TRUE, ...)
## S3 method for class 'mboost':
AIC(object, method = c("corrected", "classical", "gMDL"),
    df = c("trace", "actset"), ..., k = 2)
## S3 method for class 'mboost':
mstop(object, ...)
## S3 method for class 'gbAIC':
mstop(object, ...)
## S3 method for class 'cvrisk':
mstop(object, ...)
## S3 method for class 'mboost':
predict(object, newdata = NULL,
    type = c("link", "response", "class"), which = NULL,
    aggregate = c("sum", "cumsum", "none"), ...)
## S3 method for class 'glmboost':
predict(object, newdata = NULL,
    type = c("link", "response", "class"), which = NULL,
    aggregate = c("sum", "cumsum", "none"), ...)
## S3 method for class 'mboost':
fitted(object, ...)
## S3 method for class 'mboost':
logLik(object, ...)
## S3 method for class 'gamboost':
hatvalues(model, ...)
## S3 method for class 'glmboost':
hatvalues(model, ...)
## S3 method for class 'mboost':
selected(object)
## S3 method for class 'mboost':
nuisance(object)

Arguments

object

objects of class glmboost, gamboost, blackboost or gbAIC.

objects of class glmboost or gamboost.

model

objects of class mboost

newdata

optionally, a data frame in which to look for variables with which to predict. In case the model was fitted using the matrix interface to glmboost, newdata mus

which

a subset of base-learners to take into account for computing predictions or coefficients. If which is given (as an integer vector or characters corresponding to base-learners) a list is returne

type

the type of prediction required. The default is on the scale of the predictors; the alternative "response" is on the scale of the response variable. Thus for a binomial model the default predictions are of log-

aggregate

a character specifying how to aggregate predictions of single base-learners. The default returns the prediction for the final number of boosting iterations. "cumsum" returns a matrix

integer. Index specifying the model to extract. If i is smaller than the initial mstop, a subset is used. If i is larger than the initial mstop, additional boosting st

return

a logical indicating whether the changed object is returned.

method

a character specifying if the corrected AIC criterion or a classical (-2 logLik + k * df) should be computed.

a character specifying how degrees of freedom should be computed: trace defines degrees of freedom by the trace of the boosting hat matrix and actset uses the number of non-zero coefficients

numeric, the penalty per parameter to be used; the default k = 2 is the classical AIC. Only used when method = "classical".

...

additional arguments passed to callies.

Warning

The coefficients resulting from boosting with family Binomial are $1/2$ of the coefficients of a logit model obtained via glm. This is due to the internal recoding of the response to $-1$ and $+1$ (see Binomial).

Details

These functions can be used to extract details from fitted models. print shows a dense representation of the model fit and summary gives a more detailed representation.

The function coef extracts the regression coefficients of a linear model fitted using the glmboost function or an additive model fitted using the gamboost. Per default, only coefficients of selected base-learners are returned. However, any desired coefficient can be extracted using the which argument (see examples for details).

The predict function can be used to predict the status of the response variable for new observations whereas fitted extracts the regression fit for the observations in the learning sample.

The [.mstop function can be used to enhance or restrict a given boosting model to the specified boosting iteration i. Note that in both cases the original x will be changed to reduce the memory footprint. If the boosting model is enhanced by specifying an index that is larger than the initial mstop, only the missing i - mstop steps are fitted. If the model is restricted, the spare steps are not dropped, i.e., if we increase i again, these boosting steps are immediately available.

The ids of base-learners selected during the fitting process can be extracted using selected(). The nuisance() method extracts nuisance parameters from the fit that are handled internally by the corresponding family object, see "boost_family".

For (generalized) linear and additive models, the AIC function can be used to compute both the classical and corrected AIC (Hurvich et al., 1998, only available when family = GaussReg() was used), which is useful for the determination of the optimal number of boosting iterations to be applied (which can be extracted via mstop). The degrees of freedom are either computed via the trace of the boosting hat matrix (which is rather slow even for moderate sample sizes) or the number of variables (non-zero coefficients) that entered the model so far (faster but only meaningful for linear models fitted via gamboost (see Hastie, 2007)).

In addition, the general Minimum Description Length criterion (Buehlmann and Yu, 2006) can be computed using function AIC.

Note that logLik and AIC only make sense when the corresponding Family implements the appropriate loss function.

References

Clifford M. Hurvich, Jeffrey S. Simonoff and Chih-Ling Tsai (1998), Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. Journal of the Royal Statistical Society, Series B, 20(2), 271--293.

Peter Buehlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477--505.

Travor Hastie (2007), Discussion of ``Boosting algorithms: Regularization, prediction and model fitting'' by Peter Buehlmann and Torsten Hothorn. Statistical Science, 22(4), 505.

Peter Buehlmann and Bin Yu (2006), Sparse boosting. Journal of Machine Learning Research, 7, 1001--1024.

Examples

Run this code

### a simple two-dimensional example: cars data
  cars.gb <- glmboost(dist ~ speed, data = cars,
                      control = boost_control(mstop = 2000))
  cars.gb

  ### initial number of boosting iterations
  mstop(cars.gb)

  ### AIC criterion
  aic <- AIC(cars.gb, method = "corrected")
  aic

  ### enhance or restrict model
  cars.gb <- gamboost(dist ~ speed, data = cars,
                      control = boost_control(mstop = 100, trace = TRUE))
  cars.gb[10]
  cars.gb[100, return = FALSE] # no refitting required
  cars.gb[150, return = FALSE] # only iterations 101 to 150 
                               # are newly fitted

  ### coefficients for optimal number of boosting iterations
  coef(cars.gb[mstop(aic)])
  plot(cars$dist, predict(cars.gb[mstop(aic)]),
       ylim = range(cars$dist))
  abline(a = 0, b = 1)

  ### example for extraction of coefficients and predictions
  set.seed(1907)
  n <- 100
  x1 <- rnorm(n)
  x2 <- rnorm(n)
  x3 <- rnorm(n)
  x4 <- rnorm(n)
  int <- rep(1, n)
  y <- 3 * x1^2 - 0.5 * x2 + rnorm(n, sd = 0.1)
  df <- data.frame(y = y, int = int, x1 = x1, x2 = x2, x3 = x3, x4 = x4)

  model <- gamboost(y ~ bols(int, intercept = FALSE) + 
                        bbs(x1, center = TRUE, df = 1) +
                        bols(x1, intercept = FALSE) +
                        bols(x2, intercept = FALSE) + 
                        bols(x3, intercept = FALSE) +
                        bols(x4, intercept = FALSE), 
                    data = df, control = boost_control(mstop = 500))
  coef(model) # standard output (only selected base-learners)
  coef(model, 
       which = 1:length(variable.names(model))) # all base-learners
  coef(model, which = "x1") # shows all base-learners for x1

Run the code above in your browser using DataLab