# mboost-package

##### mboost: Model-Based Boosting

Functional gradient descent algorithm (boosting) for optimizing general risk functions utilizing component-wise (penalized) least squares estimates or regression trees as base-learners for fitting generalized linear, additive and interaction models to potentially high-dimensional data.

- Keywords
- nonparametric, package, smooth, models

##### Details

Package: | mboost |

Version: | 2.9-1 |

Date: | 2018-08-21 |

License: | GPL-2 |

This package is intended for modern regression modeling and stands
in-between classical generalized linear and additive models, as for example
implemented by `lm`

, `glm`

, or `gam`

,
and machine-learning approaches for complex interactions models,
most prominently represented by `gbm`

and
`randomForest`

.

All functionality in this package is based on the generic
implementation of the optimization algorithm (function
`mboost_fit`

) that allows for fitting linear, additive,
and interaction models (and mixtures of those) in low and
high dimensions. The response may be numeric, binary, ordered,
censored or count data.

Both theory and applications are discussed by Buehlmann and Hothorn (2007).
UseRs without a basic knowledge of boosting methods are asked
to read this introduction before analyzing data using this package.
The examples presented in this paper are available as package vignette
`mboost_illustrations`

.

Note that the model fitting procedures in this package DO NOT automatically determine an appropriate model complexity. This task is the responsibility of the data analyst.

A description of novel features that were introduced in version 2.0 is given in Hothorn et. al (2010).

Hofner et al. (2014) present a comprehensive hands-on tutorial for using the
package `mboost`

, which is also available as
`vignette(package = "mboost", "mboost_tutorial")`

.

Ben Taieba and Hyndman (2013) used this package for fitting their model in the
Kaggle Global Energy Forecasting Competition 2012. The corresponding research
paper is a good starting point when you plan to analyze your data using
`mboost`

.

##### NEWS in 2.9-series

Series 2.9 provides a new family (`RCG`

), uses `partykit::ctree`

instead of `party::ctree`

to be more flexible, allows for multivariate
negative gradients, and leave-one-out crossvalidation. Further minor changes were
introduces and quite some bugs were fixed.

**For more details and other changes see**`news(Version >= "2.9-0", package = "mboost")`

##### NEWS in 2.8-series

Series 2.8 allows to fit models with zero boosting steps (i.e., models containing
only the offset). Furthermore, cross-validation can now also select a model
without base-learners. In a `Binomial`

family one can now specifiy
links via `make.link`

. With `Binomial(type = "glm")`

an alternative
implementation of `Binomial`

models is now existing and defines the model
along the lines of the `glm`

implementation. Additionally, it works not only with a
two-level factor but also with a two-column matrix containing the number of
successes and number of failures. Finally, a new base-learner `bkernel`

for
kernel boosting was added. The references were updated and a lot of bugs fixed.

**For more details and other changes see**`news(Version >= "2.8-0", package = "mboost")`

##### NEWS in 2.7-series

Series 2.7 provides a new family (`Cindex`

), variable importance measures
(`varimp`

) and improved plotting facilities. The manual was updated in
various places, vignettes were improved and a lot of bugs were fixed.

**For more details and other changes see**`news(Version >= "2.7-0", package = "mboost")`

##### NEWS in 2.6-series

Series 2.6 includes a lot of bug fixes and improvements. Most notably, the development of the package is now hosted entirely on github in the project boost-R/mboost. Furthermore, the package is now maintained by Benjamin Hofner.

**For more details and other changes see**`news(Version >= "2.6-0", package = "mboost")`

##### NEWS in 2.5-series

Crossvaliation does not stop on errors in single folds anymore an was
sped up by setting `mc.preschedule = FALSE`

if parallel
computations via `mclapply`

are used. The
`plot.mboost`

function is now documented. Values outside
the boundary knots are now better handeled (forbidden during fitting,
while linear extrapolation is used for prediction). Further perfomance
improvements and a lot of bug fixes have been added.

**For more details and other changes see**`news(Version >= "2.5-0", package = "mboost")`

##### NEWS in 2.4-series

Bootstrap confidence intervals have been implemented in the novel
`confint`

function. The stability
selection procedure has now been moved to a stand-alone package called
stabs, which now also implements an iterface to use stability
selection with other fitting functions. A generic function for
`"mboost"`

models is implemented in mboost.

**For more details and other changes see**`news(Version >= "2.4-0", package = "mboost")`

##### NEWS in 2.3-series

The stability selection procedure has been completely rewritten and improved. The code base is now extensively tested. New options allow for a less conservative error control.

Constrained effects can now be fitted using quadratic programming
methods using the option `type = "quad.prog"`

(default) for
highly improved speed. Additionally, new constraints have been added.

Other important changes include:

A new replacement function

`mstop(mod) <- i`

as an alternative to`mod[i]`

was added (as suggested by Achim Zeileis).We added new families

`Hurdle`

and`Multinomial`

.We added a new argument

`stopintern`

for internal stopping (based on out-of-bag data) during fitting to`boost_control`

.

**For more details and other changes see**`news(Version >= "2.3-0", package = "mboost")`

##### NEWS in 2.2-series

Starting from version 2.2, the default for the degrees of freedom has
changed. Now the degrees of freedom are (per default) defined as
$$\mathrm{df}(\lambda) = \mathrm{trace}(2S -
S^{\top}S),$$ with smoother matrix
\(S = X(X^{\top}X + \lambda K)^{-1} X\) (see Hofner et al., 2011). Earlier versions used the trace of the
smoother matrix \(\mathrm{df}(\lambda) = \mathrm{trace}(S)\) as
degrees of freedom. One can change the old definition using
`options(mboost_dftraceS = TRUE)`

(see also B. Hofner et al.,
2011 and `bols`

).

Other important changes include:

We switched from packages

`multicore`

and`snow`

to`parallel`

We changed the behavior of

`bols(x, intercept = FALSE)`

when`x`

is a factor: now the intercept is simply dropped from the design matrix and the coding can be specified as usually for factors. Additionally, a new contrast is introduced:`"contr.dummy"`

(see`bols`

for details).We changed the computation of B-spline basis at the boundaries; B-splines now also use equidistant knots in the boundaries (per default).

**For more details and other changes see**`news(Version >= "2.2-0" & Version < "2.3-0", package = "mboost")`

##### NEWS in 2.1-series

In the 2.1 series, we added multiple new base-learners including
`bmono`

(monotonic effects), `brad`

(radial
basis functions) and `bmrf`

(Markov random fields), and
extended `bbs`

to incorporate cyclic splines (via argument
`cyclic = TRUE`

). We also changed the default `df`

for
`bspatial`

to `6`

.

Starting from this version, we now also automatically center the
variables in `glmboost`

(argument `center = TRUE`

).

**For more details and other changes see**`news(Version >= "2.1-0" & Version < "2.2-0", package = "mboost")`

##### NEWS in 2.0-series

Version 2.0 comes with new features, is faster and more accurate
in some aspects. In addition, some changes to the user interface
were necessary: Subsetting `mboost`

objects changes the object.
At each time, a model is associated with a number of boosting iterations
which can be changed (increased or decreased) using the subset operator.

The `center`

argument in `bols`

was renamed
to `intercept`

. Argument `z`

renamed to `by`

.

The base-learners `bns`

and `bss`

are deprecated
and replaced by `bbs`

(which results in qualitatively the
same models but is computationally much more attractive).

New features include new families (for example for ordinal regression)
and the `which`

argument to the `coef`

and `predict`

methods for selecting interesting base-learners. Predict
methods are much faster now.

The memory consumption could be reduced considerably,
thanks to sparse matrix technology in package `Matrix`

.
Resampling procedures run automatically in parallel
on OSes where parallelization via package `parallel`

is available.

The most important advancement is a generic implementation
of the optimizer in function `mboost_fit`

.

**For more details and other changes see**`news(Version >= "2.0-0" & Version < "2.1-0", package = "mboost")`

##### References

Peter Buehlmann and Torsten Hothorn (2007),
Boosting algorithms: regularization, prediction and model fitting.
*Statistical Science*, **22**(4), 477--505.

Torsten Hothorn, Peter Buehlmann, Thomas Kneib, Matthias Schmid and
Benjamin Hofner (2010), Model-based Boosting 2.0. *Journal of
Machine Learning Research*, **11**, 2109--2113.

Benjamin Hofner, Torsten Hothorn, Thomas Kneib, and Matthias Schmid (2011),
A framework for unbiased model selection based on boosting.
*Journal of Computational and Graphical Statistics*, **20**, 956--971.

Benjamin Hofner, Andreas Mayr, Nikolay Robinzonov and Matthias Schmid
(2014). Model-based Boosting in R: A Hands-on Tutorial Using the R
Package mboost. *Computational Statistics*, **29**, 3--35.
http://dx.doi.org/10.1007/s00180-012-0382-5

Available as vignette via: ```
vignette(package = "mboost",
"mboost_tutorial")
```

Souhaib Ben Taieba and Rob J. Hyndman (2014),
A gradient boosting approach to the Kaggle load forecasting competition.
*International Journal of Forecasting*, **30**, 382--394.
http://dx.doi.org/10.1016/j.ijforecast.2013.07.005

##### See Also

The main fitting functions include:

`gamboost`

for boosted (generalized) additive models,`glmboost`

for boosted linear models and`blackboost`

for boosted trees.

Model tuning is done via cross-validation as implemented in `cvrisk`

.
See there for more details and further links.

##### Examples

```
# NOT RUN {
############################################################
## Do not run this example automatically as it takes
## some time (~ 5-10 seconds depending on the system)
data("bodyfat", package = "TH.data")
set.seed(290875)
### model conditional expectation of DEXfat given
model <- mboost(DEXfat ~
bols(age) + ### a linear function of age
btree(hipcirc, waistcirc) + ### a smooth non-linear interaction of
### hip and waist circumference
bbs(kneebreadth), ### a smooth function of kneebreadth
data = bodyfat, control = boost_control(mstop = 100))
### bootstrap for assessing `optimal' number of boosting iterations
cvm <- cvrisk(model, papply = lapply)
### restrict model to mstop(cvm)
model[mstop(cvm), return = FALSE]
mstop(model)
### plot age and kneebreadth
layout(matrix(1:2, nc = 2))
plot(model, which = c("age", "kneebreadth"))
### plot interaction of hip and waist circumference
attach(bodyfat)
nd <- expand.grid(hipcirc = h <- seq(from = min(hipcirc),
to = max(hipcirc),
length = 100),
waistcirc = w <- seq(from = min(waistcirc),
to = max(waistcirc),
length = 100))
plot(model, which = 2, newdata = nd)
detach(bodyfat)
### customized plot
layout(1)
pr <- predict(model, which = "hip", newdata = nd)
persp(x = h, y = w, z = matrix(pr, nrow = 100, ncol = 100))
# }
```

*Documentation reproduced from package mboost, version 2.9-1, License: GPL-2*