methods: Methods for Gradient Boosting Objects

Description

Methods for models fitted by boosting algorithms.

Usage

# S3 method for glmboost
print(x, ...)
# S3 method for mboost
print(x, ...)
# S3 method for mboost
summary(object, ...)
# S3 method for mboost
coef(object, which = NULL,
    aggregate = c("sum", "cumsum", "none"), ...)
# S3 method for glmboost
coef(object, which = NULL,
     aggregate = c("sum", "cumsum", "none"), off2int = FALSE, ...)
# S3 method for mboost
[(x, i, return = TRUE, ...)
mstop(x) <- value
# S3 method for mboost
AIC(object, method = c("corrected", "classical", "gMDL"),
    df = c("trace", "actset"), ..., k = 2)
# S3 method for mboost
mstop(object, ...)
# S3 method for gbAIC
mstop(object, ...)
# S3 method for cvrisk
mstop(object, ...)
# S3 method for mboost
predict(object, newdata = NULL,
        type = c("link", "response", "class"), which = NULL,
        aggregate = c("sum", "cumsum", "none"), ...)
# S3 method for glmboost
predict(object, newdata = NULL,
        type = c("link", "response", "class"), which = NULL,
        aggregate = c("sum", "cumsum", "none"), ...)
# S3 method for mboost
fitted(object, ...)
# S3 method for mboost
residuals(object, ...)
# S3 method for mboost
resid(object, ...)
# S3 method for glmboost
variable.names(object, which = NULL, usedonly = FALSE, ...)
# S3 method for mboost
variable.names(object, which = NULL, usedonly = FALSE, ...)
# S3 method for mboost
extract(object, what = c("design", "penalty", "lambda", "df",
                         "coefficients", "residuals",
                         "variable.names", "bnames", "offset",
                         "nuisance", "weights", "index", "control"),
        which = NULL, ...)
# S3 method for glmboost
extract(object, what = c("design", "coefficients", "residuals",
                         "variable.names", "offset",
                         "nuisance", "weights", "control"),
        which = NULL, asmatrix = FALSE, ...)
# S3 method for blg
extract(object, what = c("design", "penalty", "index"),
        asmatrix = FALSE, expand = FALSE, ...)
# S3 method for mboost
logLik(object, ...)
# S3 method for gamboost
hatvalues(model, ...)
# S3 method for glmboost
hatvalues(model, ...)
# S3 method for mboost
selected(object, ...)
# S3 method for mboost
risk(object, ...)
# S3 method for mboost
nuisance(object)
downstream.test(object, ...)

Arguments

object

objects of class glmboost, gamboost, blackboost or gbAIC.

objects of class glmboost or gamboost.

model

objects of class mboost

newdata

optionally, a data frame in which to look for variables with which to predict. In case the model was fitted using the matrix interface to glmboost, newdata must be a matrix as well (an error is given otherwise).

which

a subset of base-learners to take into account for computing predictions or coefficients. If which is given (as an integer vector or characters corresponding to base-learners) a list or matrix is returned.

usedonly

logical. Indicating whether all variable names should be returned or only those selected in the boosting algorithm.

type

the type of prediction required. The default is on the scale of the predictors; the alternative "response" is on the scale of the response variable. Thus for a binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities. The "class" option returns predicted classes.

aggregate

a character specifying how to aggregate predictions or coefficients of single base-learners. The default returns the prediction or coefficient for the final number of boosting iterations. "cumsum" returns a list with matrices (one per base-learner) with the cumulative coefficients for all iterations simultaneously (in columns). "none" returns a list of matrices where the \(j\)th columns of the respective matrix contains the predictions of the base-learner of the \(j\)th boosting iteration (and zero if the base-learner is not selected in this iteration).

off2int

logical. Indicating whether the offset should be added to the intercept (if there is any) or if the offset is returned as attribute of the coefficient (default).

integer. Index specifying the model to extract. If i = 0, the offset model is returned. If i is smaller than the initial mstop, a subset is used. If i is larger than the initial mstop, additional boosting steps are performed until step i is reached. See details for more information.

value

integer. See i.

return

a logical indicating whether the changed object is returned.

method

a character specifying if the corrected AIC criterion or a classical (-2 logLik + k * df) should be computed.

a character specifying how degrees of freedom should be computed: trace defines degrees of freedom by the trace of the boosting hat matrix and actset uses the number of non-zero coefficients for each boosting iteration.

numeric, the penalty per parameter to be used; the default k = 2 is the classical AIC. Only used when method = "classical".

what

a character specifying the quantities to extract. Depending on object this can be a subset of "design" (default; design matrix), "penalty" (penalty matrix), "lambda" (smoothing parameter), "df" (degrees of freedom), "coefficients", "residuals", "variable.names", "bnames" (names of the base-learners), "offset", "nuisance", "weights", "index" (index of ties used to expand the design matrix) and "control". In future versions additional extractors might be specified.

asmatrix

a logical indicating whether the the returned matrix should be coerced to a matrix (default) or if the returned object stays as it is (i.e., potentially a sparse matrix). This option is only applicable if extract returns matrices, i.e., what = "design" or what = "penalty".

expand

a logical indicating whether the design matrix should be expanded (default: FALSE). This is useful if ties where taken into account either manually (via argument index in a base-learner) or automatically for data sets with many observations. expand = TRUE is equivalent to extract(B)[extract(B, what = "index"),] for a base-learner B.

…

additional arguments passed to callies.

Warning

The coefficients resulting from boosting with family Binomial(link = "logit") are \(1/2\) of the coefficients of a logit model obtained via glm (see Binomial).

Details

These functions can be used to extract details from fitted models. print shows a dense representation of the model fit and summary gives a more detailed representation.

The function coef extracts the regression coefficients of a linear model fitted using the glmboost function or an additive model fitted using the gamboost. Per default, only coefficients of selected base-learners are returned. However, any desired coefficient can be extracted using the which argument (see examples for details). Per default, the coefficient of the final iteration is returned (aggregate = "sum") but it is also possible to return the coefficients from all iterations simultaniously (aggregate = "cumsum"). If aggregate = "none" is specified, the coefficients of the selected base-learners are returned (see examples below). For models fitted via glmboost with option center = TRUE the intercept is rarely selected. However, it is implicitly estimated through the centering of the design matrix. In this case the intercept is always returned except which is specified such that the intercept is not selected. See examples below.

The predict function can be used to predict the status of the response variable for new observations whereas fitted extracts the regression fit for the observations in the learning sample. For predict newdata can be specified, otherwise the fitted values are returned. If which is specified, marginal effects of the corresponding base-learner(s) are returned. The argument type can be used to make predictions on the scale of the link (i.e., the linear predictor \(X\beta\)), the response (i.e. \(h(X\beta)\), where h is the response function) or the class (in case of classification). Furthermore, the predictions can be aggregated analogously to coef by setting aggregate to either sum (default; predictions of the final iteration are given), cumsum (predictions of all iterations are returned simultaniously) or none (change of prediction in each iteration). If applicable the offset is added to the predictions. If marginal predictions are requested the offset is attached to the object via attr(..., "offset") as adding the offset to one of the marginal predictions doesn't make much sense.

The [.mboost function can be used to enhance or restrict a given boosting model to the specified boosting iteration i. Note that in both cases the original x will be changed to reduce the memory footprint. If the boosting model is enhanced by specifying an index that is larger than the initial mstop, only the missing i - mstop steps are fitted. If the model is restricted, the spare steps are not dropped, i.e., if we increase i again, these boosting steps are immediately available. Alternatively, the same operation can be done by mstop(x) <- i.

The residuals function can be used to extract the residuals (i.e., the negative gradient of the current iteration). resid is is an alias for residuals.

Variable names (including those of interaction effects specified via by in a base-learner) can be extracted using the generic function variable.names, which has special methods for boosting objects.

The generic extract function can be used to extract various characteristics of a fitted model or a base-learner. Note that the sometimes a penalty function is returned (e.g. by extract(bols(x), what = "penalty")) even if the estimation is unpenalized. However, in this case the penalty paramter lambda is set to zero. If a matrix is returned by extract one can to set asmatrix = TRUE if the returned matrix should be coerced to class matrix. If asmatrix = FALSE one might get a sparse matrix as implemented in package Matrix. If one requests the design matrix (what = "design") expand = TRUE expands the resulting matrix by taking the duplicates handeled via index into account.

The ids of base-learners selected during the fitting process can be extracted using selected(). The nuisance() method extracts nuisance parameters from the fit that are handled internally by the corresponding family object, see "'>boost_family". The risk() function can be used to extract the computed risk (either the "inbag" risk or the "oobag" risk, depending on the control argument; see boost_control).

For (generalized) linear and additive models, the AIC function can be used to compute both the classical AIC (only available for familiy = Binomial() and familiy = Poisson()) and corrected AIC (Hurvich et al., 1998, only available when family = Gaussian() was used). Details on the used approximations for the hat matrix can be found in Buehlmann and Hothorn (2007). The AIC is useful for the determination of the optimal number of boosting iterations to be applied (which can be extracted via mstop). The degrees of freedom are either computed via the trace of the boosting hat matrix (which is rather slow even for moderate sample sizes) or the number of variables (non-zero coefficients) that entered the model so far (faster but only meaningful for linear models fitted via gamboost (see Hastie, 2007)). For a discussion of the use of AIC based stopping see also Mayr, Hofner and Schmid (2012).

In addition, the general Minimum Description Length criterion (Buehlmann and Yu, 2006) can be computed using function AIC.

Note that logLik and AIC only make sense when the corresponding Family implements the appropriate loss function.

downstream.test computes tests for linear models fitted via glmboost with a likelihood based loss function and only suitable without early stopping, i.e., if likelihood based model converged. In order to work, the Fisher matrix must be implemented in the Family; currently this is only the case for family RCG.

References

Benjamin Hofner, Andreas Mayr, Nikolay Robinzonov and Matthias Schmid (2014). Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost. Computational Statistics, 29, 3--35. http://dx.doi.org/10.1007/s00180-012-0382-5

Clifford M. Hurvich, Jeffrey S. Simonoff and Chih-Ling Tsai (1998), Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. Journal of the Royal Statistical Society, Series B, 20(2), 271--293.

Peter Buehlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477--505.

Trevor Hastie (2007), Discussion of ``Boosting algorithms: Regularization, prediction and model fitting'' by Peter Buehlmann and Torsten Hothorn. Statistical Science, 22(4), 505.

Peter Buehlmann and Bin Yu (2006), Sparse boosting. Journal of Machine Learning Research, 7, 1001--1024.

Andreas Mayr, Benjamin Hofner, and Matthias Schmid (2012). The importance of knowing when to stop - a sequential stopping rule for component-wise gradient boosting. Methods of Information in Medicine, 51, 178--186. DOI: http://dx.doi.org/10.3414/ME11-02-0030

Examples

Run this code

# NOT RUN {
  ### a simple two-dimensional example: cars data
  cars.gb <- glmboost(dist ~ speed, data = cars,
                      control = boost_control(mstop = 2000),
                      center = FALSE)
  cars.gb

  ### initial number of boosting iterations
  mstop(cars.gb)

  ### AIC criterion
  aic <- AIC(cars.gb, method = "corrected")
  aic

  ### extract coefficients for glmboost
  coef(cars.gb)
  coef(cars.gb, off2int = TRUE)        # offset added to intercept
  coef(lm(dist ~ speed, data = cars))  # directly comparable

  cars.gb_centered <- glmboost(dist ~ speed, data = cars,
                               center = TRUE)
  selected(cars.gb_centered)           # intercept never selected
  coef(cars.gb_centered)               # intercept implicitly estimated
                                       # and thus returned
  ## intercept is internally corrected for mean-centering
  - mean(cars$speed) * coef(cars.gb_centered, which="speed") # = intercept
  # not asked for intercept thus not returned
  coef(cars.gb_centered, which="speed")
  # explicitly asked for intercept
  coef(cars.gb_centered, which=c("Intercept", "speed"))

  ### enhance or restrict model
  cars.gb <- gamboost(dist ~ speed, data = cars,
                      control = boost_control(mstop = 100, trace = TRUE))
  cars.gb[10]
  cars.gb[100, return = FALSE] # no refitting required
  cars.gb[150, return = FALSE] # only iterations 101 to 150
                               # are newly fitted

  ### coefficients for optimal number of boosting iterations
  coef(cars.gb[mstop(aic)])
  plot(cars$dist, predict(cars.gb[mstop(aic)]),
       ylim = range(cars$dist))
  abline(a = 0, b = 1)

  ### example for extraction of coefficients
  set.seed(1907)
  n <- 100
  x1 <- rnorm(n)
  x2 <- rnorm(n)
  x3 <- rnorm(n)
  x4 <- rnorm(n)
  int <- rep(1, n)
  y <- 3 * x1^2 - 0.5 * x2 + rnorm(n, sd = 0.1)
  data <- data.frame(y = y, int = int, x1 = x1, x2 = x2, x3 = x3, x4 = x4)

  model <- gamboost(y ~ bols(int, intercept = FALSE) +
                        bbs(x1, center = TRUE, df = 1) +
                        bols(x1, intercept = FALSE) +
                        bols(x2, intercept = FALSE) +
                        bols(x3, intercept = FALSE) +
                        bols(x4, intercept = FALSE),
                    data = data, control = boost_control(mstop = 500))

  coef(model) # standard output (only selected base-learners)
  coef(model,
       which = 1:length(variable.names(model))) # all base-learners
  coef(model, which = "x1") # shows all base-learners for x1

  cf1 <- coef(model, which = c(1,3,4), aggregate = "cumsum")
  tmp <- sapply(cf1, function(x) x)
  matplot(tmp, type = "l", main = "Coefficient Paths")

  cf1_all <- coef(model, aggregate = "cumsum")
  cf1_all <- lapply(cf1_all, function(x) x[, ncol(x)]) # last element
  ## same as coef(model)

  cf2 <- coef(model, aggregate = "none")
  cf2 <- lapply(cf2, rowSums) # same as coef(model)

  ### example continued for extraction of predictions

  yhat <- predict(model) # standard prediction; here same as fitted(model)
  p1 <- predict(model, which = "x1") # marginal effects of x1
  orderX <- order(data$x1)
  ## rowSums needed as p1 is a matrix
  plot(data$x1[orderX], rowSums(p1)[orderX], type = "b")

  ## better: predictions on a equidistant grid
  new_data <- data.frame(x1 = seq(min(data$x1), max(data$x1), length = 100))
  p2 <- predict(model, newdata = new_data, which = "x1")
  lines(new_data$x1, rowSums(p2), col = "red")

  ### extraction of model characteristics
  extract(model, which = "x1")  # design matrices for x1
  extract(model, what = "penalty", which = "x1") # penalty matrices for x1
  extract(model, what = "lambda", which = "x1") # df and corresponding lambda for x1
       ## note that bols(x1, intercept = FALSE) is unpenalized

  extract(model, what = "bnames")  ## name of complete base-learner
  extract(model, what = "variable.names") ## only variable names
  variable.names(model)            ## the same

  ### extract from base-learners
  extract(bbs(x1), what = "design")
  extract(bbs(x1), what = "penalty")
  ## weights and lambda can only be extracted after using dpp
  weights <- rep(1, length(x1))
  extract(bbs(x1)$dpp(weights), what = "lambda")
# }

Run the code above in your browser using DataLab