boost: Gradient Boosting BAMLSS

Description

Optimizer function for gradient boosting with bamlss. In each boosting iteration the function selects the model term with the largest contribution to the log-likelihood.

Usage

## Gradient boosting optimizer.
boost(x, y, family,
  nu = 0.1, df = 4, maxit = 400, mstop = NULL,
  verbose = TRUE, digits = 4, flush = TRUE,
  eps = .Machine$double.eps^0.25, nback = NULL,
  plot = TRUE, initialize = TRUE, ...)
## Boosting summary extractor.
boost.summary(object, ...)
## Plot all boosting paths.
boost.plot(x, which = c("loglik", "loglik.contrib", "parameters"),
  intercept = TRUE, spar = TRUE, mstop = NULL, name = NULL,
  labels = NULL, color = NULL, ...)
## Boosting summary printing and plotting.
# S3 method for boost.summary
print(x, summary = TRUE, plot = TRUE,
  which = c("loglik", "loglik.contrib"), intercept = TRUE,
  spar = TRUE, ...)
# S3 method for boost.summary
plot(x, ...)

Arguments

For function boost() the x list, as returned from function bamlss.frame, holding all model matrices and other information that is used for fitting the model. For the plotting function the corresponding bamlss object fitted with the boost() optimizer.

The model response, as returned from function bamlss.frame.

family

A bamlss family object, see family.bamlss.

Numeric, between [0, 1], controls the step size, i.e., the amount that should be added to model term parameters.

Integer, defines the initial degrees of freedom that should be assigned to each smooth model term. May also be a named vector, the names must match the model term labels, e.g., as provided in summary.bamlss.

maxit

Integer, the maximum number of boosting iterations.

mstop

For convenience, overwrites maxit.

name

Character, the name of the coefficient (group) that should be plotted. Note that the string provided in name will be removed from the labels on the 4th axis.

labels

A character string of labels that should be used on the 4 axis.

color

Colors or color function that creates colors for the (group) paths.

verbose

Print information during runtime of the algorithm.

digits

Set the digits for printing when verbose = TRUE.

flush

use flush.console for displaying the current output in the console.

eps

The tolerance used as stopping mechanism, see argument nback.

nback

Integer. If nback is not NULL, then the algorithm stops if the the change in the log-likelihood of the last nback iterations is smaller or equal to eps. If maxit = NULL the maximum number of iterations is set to 10000.

plot

Should the boosting summary be printed and plotted?

initialize

Logical, should intercepts be initialized?

object

A bamlss object that was fitted using boost().

summary

Should the summary be printed?

which

Which of the three provided plots should be created?

intercept

Should the coefficient paths of intercepts be dropped in the plot?

spar

Should graphical parmeters be set with par?

…

For function boost(), arguments passed to bamlss.engine.setup. for function boost.summary() arguments passed to function print.boost.summary().

Value

For function boost.summary() a list containing information on selection frequencies etc. For function boost() a list containing the following objects:

fitted.values

A named list of the fitted values based on the last boosting iteration of the modeled parameters of the selected distribution.

parameters

A matrix, each row corresponds to the parameter values of one boosting iteration.

boost.summary

The boosting summary which can be printed and plotted.

WARNINGS

The function does not take care of variable scaling for the linear parts! This must be done by the user, e.g., one option is to use argument scale.d in function bamlss.frame, which uses scale.

Function boost() does not select the optimum stopping iteration!

Examples

Run this code

# NOT RUN {
## Simulate data.
set.seed(123)
d <- GAMart()

## Estimate model.
f <- num ~ x1 + x2 + x3 + lon + lat +
  s(x1) + s(x2) + s(x3) + s(lon) + s(lat) + te(lon,lat)

b <- bamlss(f, data = d, optimizer = boost,
  sampler = FALSE, scale.d = TRUE, nu = 0.01,
  maxit = 1000, plot = FALSE)

## Plot estimated effects.
plot(b)

## Print and plot the boosting summary.
boost.summary(b, plot = FALSE)
boost.plot(b, which = 1)
boost.plot(b, which = 2)
boost.plot(b, which = 3, name = "mu.s.te(lon,lat).")

## Extract estimated parameters for certain
## boosting iterations.
parameters(b, mstop = 1)
parameters(b, mstop = 100)

## Also works with predict().
head(do.call("cbind", predict(b, mstop = 1)))
head(do.call("cbind", predict(b, mstop = 100)))
# }