# baselearners

##### Base-learners for Gradient Boosting

Base-learners for fitting base-models in the generic implementation of
component-wise gradient boosting in function `mboost`

.

- Keywords
- models

##### Usage

```
bols(..., by = NULL, index = NULL, intercept = TRUE, df = NULL,
lambda = 0, contrasts.arg = "contr.treatment")
bbs(..., by = NULL, index = NULL, knots = 20, degree = 3,
differences = 2, df = 4, lambda = NULL, center = FALSE)
bspatial(...)
brandom(..., df = 4)
btree(..., tree_controls = ctree_control(stump = TRUE,
mincriterion = 0))
bl1 %+% bl2
bl1 %X% bl2
```

##### Arguments

- ...
- one or more predictor variables or one data frame of predictor variables.
- by
- an optional variable defining varying coefficients, either a binary or numeric variable.
- index
- a vector of integers for expanding the variables in
`...`

. For example,`bols(x, index = index)`

is equal to`bols(x[index])`

, where`index`

is an integer of length greater or e - df
- trace of the hat matrix for the base-learner defining the base-learner
complexity. Low values of
`df`

correspond to a large amount of smoothing and thus to "weaker" base-learners. Certain restrictions have to be kept - lambda
- smoothing penalty, computed from
`df`

when`df`

is specified. - knots
- either the number of (equidistant) interior knots to be used for
the regression spline fit or a vector including the positions of the interior
knots. For multiple predictor variables,
`knots`

may be a named - degree
- degree of the regression spline.
- differences
- 1, 2, or 3. If
`differences`

=*k*,*k*-th-order differences are used as a penalty. - intercept
- if
`intercept=TRUE`

an intercept is added to the design matrix of a linear base-learner. - center
- if
`center=TRUE`

the corresponding effect is re-parameterized such that the unpenalized part of the fit is subtracted and only the deviation effect is fitted. The unpenalized, parametric part has then to - contrasts.arg
- a character suitable for input to the
`contrasts`

replacement function, see`model.matrix`

. - tree_controls
- an object of class
`"`

, which can be obtained usingTreeControl "`ctree_control`

. Defines hyper-parameters for the trees w - bl1
- a linear base-learner or a list of linear base-learners.
- bl2
- a linear base-learner or a list of linear base-learners.

##### Details

`bols`

refers to linear base-learners (potentially estimated with a ridge penalty), while
`bbs`

provide penalized regression splines. `bspatial`

fits bivariate surfaces and `brandom`

defines random effects base-learners.
In combination with option `by`

, these base-learners can be turned into varying
coefficient terms. The linear base-learners are fitted using Ridge Regression
where the penalty parameter `lambda`

is either computed from `df`

(default for `bbs`

, `bspatial`

, and `brandom`

) or specified directly
(`lambda = 0`

means no penalization as default for `bols`

).
In `bols(x)`

, `x`

may be a numeric vector or factor. Alternatively,
`x`

can be a data frame containing numeric or factor variables.
In this case, or when multiple predictor variables are specified, e.g.,
using `bols(x1, x2)`

, the model is equivalent to `lm(y ~ ., data = x)`

or `lm(y ~ x1 + x2)`

, respectively.
By default, an intercept term is added to the corresponding design matrix
(which can be omitted using `intercept = FALSE`

). When `df`

is
given, a ridge estimator with `df`

degrees of freedom (trace of hat matrix)
is used as base-learner. Note that all variables are treated as a group,
i.e., they enter the model together if the corresponding base-learner is selected.
With `bbs`

, the P-spline approach of Eilers and Marx (1996) is
used. P-splines use a squared *k*-th-order difference penalty
which can be interpreted as an approximation of the integrated squared
*k*-th derivative of the spline.
`bspatial`

implements bivariate tensor product P-splines for the
estimation of either spatial effects or interaction surfaces. Note
that `bspatial(x, y)`

is equivalent to `bbs(x, y)`

. For
possible arguments and defaults see there.
The penalty term is constructed based on bivariate extensions of the
univariate penalties in `x`

and `y`

directions, see Kneib,
Hothorn and Tutz (2009) for details. Note that the dimensions of the
penalty matrix increase (quickly) with the number of knots with strong
impact on computational time. Thus, both should not be chosen to
large. Different knots for `x`

and `y`

can be specified
by a named list.
`brandom(x)`

specifies a random effects base-learner based on a
factor variable `x`

that defines the grouping structure of the
data set. For each level of `x`

, a separate random intercept is
fitted, where the random effects variance is governed by the
specification of the degrees of freedom `df`

.
For all linear base-learners the amount of smoothing is determined by the
trace of the hat matrix, as indicated by `df`

. If `df`

is
specified in `bols`

a ridge penalty with the according degrees of
freedom is used. For ordinal variables, a ridge penalty for the
differences of the adjacent categories (Gertheiss and Tutz 2009) is applied.
If `by`

is specified as an additional argument, a
varying coefficients term is estimated, where `by`

is the
interaction variable and the effect modifier is given by either
`x`

or `x`

and `y`

(specified via `...`

).
If `bbs`

is used, this corresponds to the
classical situation of varying coefficients, where the effect of
`by`

varies over the co-domain of `x`

. In case of `bspatial`

as
base-learner, the effect of `by`

varies with respect to both
`x`

and `y`

, i.e. an interaction surface between `x`

and
`y`

is specified as effect modifier. For `brandom`

specification of `by`

leads to the estimation of random slopes for covariate `by`

with grouping structure
defined by factor `x`

instead of a simple random intercept.
For `bbs`

and `bspatial`

, option `center`

requests that the
fitted effect is centered around its parametric, unpenalized part. For
example, with second order difference penalty, a linear effect of `x`

remains unpenalized by `bbs`

and therefore the degrees of freedom for the base-learner
have to be larger than two. To avoid this restriction, option `center=TRUE`

subtracts the unpenalized linear effect from the fit, allowing to specify any
positive number as `df`

. Note that in this case the linear effect
`x`

should generally be specified as an additional base-learner
`bols(x)`

. For `bspatial`

and, for example, second order
differences, a linear effect of `x`

(`bols(x)`

), a linear effect of
`y`

(`bols(y)`

), and their interaction (`bols(x*y)`

) are
subtracted from the effect and have to be added separately to the model
equation. More details on centering can be found in Kneib, Hothorn and Tutz
(2009) and Fahrmeir, Kneib and Lang (2004).
For a categorical covariate with non-observed categories
`bols(x)`

and `brandom(x)`

both assign a zero effect
these categories. However, the non-observed categories must be
listed in `levels(x)`

. Thus, predictions are possible
for new observations if they correspond to this category.
By default, all linear base-learners include an intercept term (which can
be removed using `intercept = FALSE`

for `bols`

or
`center = TRUE`

for `bbs`

). In this case, an explicit global
intercept term should be added to `gamboost`

via `bols`

(see
example below).
Three global options affect the base-learners: `option("mboost_useMatrix")`

defaulting to `TRUE`

indicates that the base-learner may use
sparse matrix techniques for its computations. This reduces the memory
consumption but might (for smaller sample sizes) require more computing
time. `option("mboost_indexmin")`

is an integer for the sample
size required to optimize model fitting by taking ties into account.
`option("mboost_dftraceS")`

, which is also `TRUE`

by default,
indicates that the trace of the smoother matrix is used as degrees
of freedom. If `FALSE`

, an alternative is used (see
Hofner et al., 2009).
Two or more linear base-learners can be joined using `%+%`

. A tensor product
of two or more linear base-learners is returned by `%X%`

.
These two features are experimental and for expert use only.
`btree`

fits a stump to one or more variables. Note that
`blackboost`

is more efficient for boosting stumps.

##### Value

- An object of class
`bl`

(base-learner) with a`dpp`

function. The call of`dpp`

returns an object of class`bm`

(base-model).

##### References

Paul H. C. Eilers and Brian D. Marx (1996), Flexible smoothing with B-splines
and penalties. *Statistical Science*, **11**(2), 89-121.
Ludwig Fahrmeir, Thomas Kneib and Stefan Lang (2004), Penalized structured
additive regression for space-time data: a Bayesian perspective.
*Statistica Sinica*, **14**, 731-761.
Jan Gertheiss and Gerhard Tutz (2009), Penalized regression with ordinal
predictors, *International Statistical Review*, **77**(3), 345--365.
Benjamin Hofner, Torsten Hothorn, Thomas Kneib, and Matthias Schmid (2009),
A framework for unbiased model selection based on boosting.
Technical Report Nr. 72, Institut fuer Statistik, LMU Muenchen.
*Biometrics*, **65**(2),
626--634.

##### See Also

##### Examples

```
set.seed(290875)
n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n) + 0.25 * x1
x3 <- as.factor(sample(0:1, 100, replace = TRUE))
x4 <- gl(4, 25)
y <- 3 * sin(x1) + x2^2 + rnorm(n)
weights <- drop(rmultinom(1, n, rep.int(1, n) / n))
### set up base-learners
spline1 <- bbs(x1, knots = 20, df = 4)
attributes(spline1)
knots.x2 <- quantile(x2, c(0.25, 0.5, 0.75))
spline2 <- bbs(x2, knots = knots.x2, df = 5)
attributes(spline2)
attributes(ols3 <- bols(x3))
attributes(ols4 <- bols(x4))
### compute base-models
drop(ols3$dpp(weights)$fit(y)$model) ## same as:
coef(lm(y ~ x3, weights = weights))
drop(ols4$dpp(weights)$fit(y)$model) ## same as:
coef(lm(y ~ x4, weights = weights))
### fit model, component-wise
mod1 <- mboost_fit(list(spline1, spline2, ols3, ols4), y, weights)
### more convenient formula interface
mod2 <- mboost(y ~ bbs(x1, knots = 20, df = 4) +
bbs(x2, knots = knots.x2, df = 5) +
bols(x3) + bols(x4))
all.equal(coef(mod1), coef(mod2))
### grouped linear effects
model <- gamboost(y ~ bols(x1, x2, intercept = FALSE) +
bols(x1, intercept = FALSE) +
bols(x2, intercept = FALSE),
control = boost_control(mstop = 400))
coef(model, which=1) # one base-learner for x1 and x2
coef(model, which=2:3) # two separate base-learners for x1 and x2
### example for bspatial
x1 <- runif(250,-pi,pi)
x2 <- runif(250,-pi,pi)
y <- sin(x1) * sin(x2) + rnorm(250, sd = 0.4)
spline3 <- bspatial(x1, x2, knots=12)
attributes(spline3)
## specify number of knots separately
form2 <- y ~ bspatial(x1, x2, knots=list(x1=12, x2=12))
## decompose spatial effect into parametric part and
## deviation with one df
form2 <- y ~ bols(x1) + bols(x2) + bols(x1*x2) +
bspatial(x1, x2, knots = 12, center = TRUE, df = 1)
### random intercept
id <- factor(rep(1:10, each = 5))
raneff <- brandom(id)
attributes(raneff)
## random intercept with non-observed category
set.seed(1907)
y <- rnorm(50, mean = rep(rnorm(10), each = 5), sd = 0.1)
plot(y ~ id)
# category 10 not observed
obs <- c(rep(1, 45), rep(0, 5))
model <- gamboost(y ~ brandom(id), weights = obs)
coef(model)
fitted(model)[46:50] # just the grand mean as usual for
# random effects models
### random slope
z <- runif(50)
raneff <- brandom(id, by=z)
attributes(raneff)
### remove intercept from base-learner
### and add explicit intercept to the model
tmpdata <- data.frame(x = 1:100, y = rnorm(1:100), int = rep(1, 100))
mod <- gamboost(y ~ bols(int, intercept = FALSE) +
bols(x, intercept = FALSE),
data = tmpdata,
control = boost_control(mstop = 2500))
cf <- unlist(coef(mod))
cf[1] <- cf[1] + mod$offset
cf
coef(lm(y ~ x, data = tmpdata))
### large data set with ties
nunique <- 100
xindex <- sample(1:nunique, 1000000, replace = TRUE)
x <- runif(nunique)
y <- rnorm(length(xindex))
w <- rep.int(1, length(xindex))
### brute force computations
op <- options()
options(mboost_indexmin = Inf, mboost_useMatrix = FALSE)
## data pre-processing
b1 <- bbs(x[xindex])$dpp(w)
## model fitting
c1 <- b1$fit(y)$model
options(op)
### automatic search for ties, faster
b2 <- bbs(x[xindex])$dpp(w)
c2 <- b2$fit(y)$model
### manual specification of ties, even faster
b3 <- bbs(x, index = xindex)$dpp(w)
c3 <- b3$fit(y)$model
all.equal(c1, c2)
all.equal(c1, c3)
```

*Documentation reproduced from package mboost, version 2.0-0, License: GPL-2*