# baselearners

##### Base-learners for Gradient Boosting

Base-learners for fitting base-models in the generic implementation of
component-wise gradient boosting in function `mboost`

.

- Keywords
- models

##### Usage

```
## linear base-learner
bols(..., by = NULL, index = NULL, intercept = TRUE, df = NULL,
lambda = 0, contrasts.arg = "contr.treatment")
```## smooth P-spline base-learner
bbs(..., by = NULL, index = NULL, knots = 20, boundary.knots = NULL,
degree = 3, differences = 2, df = 4, lambda = NULL, center = FALSE,
cyclic = FALSE, constraint = c("none", "increasing", "decreasing"),
deriv = 0)

## bivariate P-spline base-learner
bspatial(..., df = 6)

## radial basis functions base-learner
brad(..., by = NULL, index = NULL, knots = 100, df = 4, lambda = NULL,
covFun = fields::stationary.cov,
args = list(Covariance="Matern", smoothness = 1.5, theta=NULL))
## (genetic) pathway-based kernel base-learner
bkernel(..., df = 4, lambda = NULL, kernel = c("lin", "sia", "net"),
pathway = NULL, knots = NULL, args = list())

## random effects base-learner
brandom(..., by = NULL, index = NULL, df = 4, lambda = NULL,
contrasts.arg = "contr.dummy")

## tree based base-learner
btree(..., nmax = Inf, tree_controls = partykit::ctree_control(
teststat = "quad", testtype = "Teststatistic",
mincriterion = 0, minsplit = 10, minbucket = 4,
maxdepth = 1, saveinfo = FALSE))

## constrained effects base-learner
bmono(...,
constraint = c("increasing", "decreasing", "convex", "concave",
"none", "positive", "negative"),
type = c("quad.prog", "iterative"),
by = NULL, index = NULL, knots = 20, boundary.knots = NULL,
degree = 3, differences = 2, df = 4, lambda = NULL,
lambda2 = 1e6, niter=10, intercept = TRUE,
contrasts.arg = "contr.treatment",
boundary.constraints = FALSE,
cons.arg = list(lambda = 1e+06, n = NULL, diff_order = NULL))

## Markov random field base-learner
bmrf(..., by = NULL, index = NULL, bnd = NULL, df = 4, lambda = NULL,
center = FALSE)

## user-specified base-learner
buser(X, K = NULL, by = NULL, index = NULL, df = 4, lambda = NULL)

## combining single base-learners to form new,
## more complex base-learners
bl1 %+% bl2
bl1 %X% bl2
bl1 %O% bl2

##### Arguments

- ...
one or more predictor variables or one matrix or data frame of predictor variables. For smooth base-learners, the number of predictor variables and the number of columns in the data frame / matrix must be less than or equal to 2. If a matrix (with at least 2 columns) is given to

`bols`

or`brandom`

, it is directly used as the design matrix. Especially, no intercept term is added regardless of argument`intercept`

. If the argument has only one column, it is simplified to a vector and an intercept is added or not according to the argmuent`intercept`

.- by
an optional variable defining varying coefficients, either a factor or numeric variable. If

`by`

is a factor, the coding is determined by the global`options("contrasts")`

or as specified "locally" for the factor (see`contrasts`

). Per default treatment coding is used. Note that the main effect needs to be specified in a separate base-learner.- index
a vector of integers for expanding the variables in

`...`

. For example,`bols(x, index = index)`

is equal to`bols(x[index])`

, where`index`

is an integer of length greater or equal to`length(x)`

.- df
trace of the hat matrix for the base-learner defining the base-learner complexity. Low values of

`df`

correspond to a large amount of smoothing and thus to "weaker" base-learners. Certain restrictions have to be kept for the specification of`df`

since most of the base-learners rely on penalization approaches with a non-trivial null space. For example, for P-splines fitted with`bbs`

,`df`

has to be larger than the order of differences employed in the construction of the penalty term. However, when option`center != FALSE`

, the effect is centered around its unpenalized part and therefore any positive number is admissible for`df`

. For details on the computation of degrees of freedom see section ‘Global Options’.- lambda
smoothing penalty, computed from

`df`

when`df`

is specified. For details on the computation of degrees of freedom see section ‘Global Options’.- knots
either the number of knots or a vector of the positions of the interior knots (for more details see below). For multiple predictor variables,

`knots`

may be a named list where the names in the list are the variable names.- boundary.knots
boundary points at which to anchor the B-spline basis (default the range of the data). A vector (of length 2) for the lower and the upper boundary knot can be specified.This is only advised for

`bbs(..., cyclic = TRUE)`

, where the boundary knots specify the points at which the cyclic function should be joined. In analogy to`knots`

a names list can be specified.- degree
degree of the regression spline.

- differences
a non-negative integer, typically 1, 2 or 3. If

`differences`

=*k*,*k*-th-order differences are used as a penalty (*0*-th order differences specify a ridge penalty).- intercept
if

`intercept = TRUE`

an intercept is added to the design matrix of a linear base-learner. If`intercept = FALSE`

, continuous covariates should be (mean-) centered.- center
if

`center != FALSE`

the corresponding effect is re-parameterized such that the unpenalized part of the fit is subtracted and only the deviation effect is fitted. The unpenalized, parametric part has then to be included in separate base-learners using`bols`

(see the examples below). There are two possible ways to re-parameterization;`center = "differenceMatrix"`

is based on the difference matrix (the default for`bbs`

with one covariate only) and`center = "spectralDecomp"`

uses a spectral decomposition of the penalty matrix (see Fahrmeir et al., 2004, Section 2.3 for details). The latter option is the default (and currently only option) for`bbs`

with multiple covariates or`bmrf`

.- cyclic
if

`cyclic = TRUE`

the fitted values coincide at the boundaries (useful for cyclic covariates such as day time etc.). For details see Hofner et al. (2016).- covFun
the covariance function (i.e. radial basis) needed to compute the basis functions. Per default

`stationary.cov`

function (from package`fields`

) is used.- args
a named list of arguments to be passed to

`cov.function`

. Thus strongly dependent on the specified`cov.function`

.- kernel
one of

`"lin"`

(linear kernel),`"sia"`

(size adjusted kernel), or`"net"`

(network kernel). For details see`calc_kernel`

- pathway
name of pathway; Pathway needs to be contained in the GWAS data set.

- contrasts.arg
a named list of characters suitable for input to the

`contrasts`

replacement function, or the contrast matrix itself, see`model.matrix`

, or a single character string (or contrast matrix) which is then used as contrasts for all factors in this base-learner (with the exception of factors in`by`

). See also example below for setting contrasts. Note that a special`contrasts.arg`

exists in package`mboost`

, namely "contr.dummy". This contrast is used per default in`brandom`

and can also be used in`bols`

. It leads to a dummy coding as returned by`model.matrix(~ x - 1)`

were the intercept is implicitly included but each factor level gets a seperate effect estimate (see example below).- nmax
integer, maximal number of bins in the predictor variables. Use

`Inf`

to switch-off binning.- tree_controls
an object of class

`"TreeControl"`

, which can be obtained using`ctree_control`

. Defines hyper-parameters for the trees which are used as base-learners, stumps are fitted by default. By default, stumps and thus additive models are fitted.- constraint
type of constraint to be used. For

`bmono`

, the constraint can be either monotonic`"increasing"`

(default),`"decreasing"`

, or`"convex"`

or`"concave"`

. Additionally,`"none"`

can be used to specify unconstrained P-splines. This is especially of interest in conjunction with`boundary.constraints = TRUE`

. For`bbs`

, the constraint can be`"none"`

, monotonic`"increasing"`

, or`"decreasing"`

. In general it is advisable to use`bmono`

to fit monotonic splines.- type
determines how the constrained least squares problem should be solved. If

`type = "quad.prog"`

, a numeric quadratic programming method (Goldfarb and Idnani, 1982, 1983) is used (see`solve.QP`

in package quadprog). If`type = "iterative"`

, the iterative procedure described in Hofner et al. (2011b) is used. The quadratic programming approach is usually much faster than the iterative approach. For details see Hofner et al. (2016).- lambda2
penalty parameter for the (monotonicity) constraint.

- niter
maximum number of iterations used to compute constraint estimates. Increase this number if a warning is displayed.

- boundary.constraints
a logical indicating whether additional constraints on the boundaries of the spline should be applied (default: FALSE). This is still experimental.

- cons.arg
a named list with additional arguments for boundary constraints. The element

`lambda`

specifies the penalty parameter that is used for the additional boundary constraint. The element`n`

specifies the number of knots to be subject to the constraint and can be either a scalar (use same number of constrained knots on each side) or a vector. Per default 10% of the knots on each side are used. The element`diff_order`

can be used to specify the order of the boundary penalty: 1 (constant; default for monotonically constrained effects) or 2 (linear; default for all other effects).- bnd
Object of class

`bnd`

, in which the boundaries of a map are defined and from which neighborhood relations can be constructed. See`read.bnd`

. If a boundary object is not available, the neighborhood matrix can also be given directly.- X
design matrix as it should be used in the penalized least squares estimation. Effect modifiers do not need to be included here (

`by`

can be used for convenience).- K
penalty matrix as it should be used in the penalized least squares estimation. If

`NULL`

(default), unpenalized estimation is used.- deriv
an integer; the derivative of the spline of the given order at the data is computed, defaults to zero. Note that this argument is only used to set up the design matrix and cannot be used in the fitting process.

- bl1
a linear base-learner or a list of linear base-learners.

- bl2
a linear base-learner or a list of linear base-learners.

##### Details

`bols`

refers to linear base-learners (potentially estimated with
a ridge penalty), while `bbs`

provide penalized regression
splines. `bspatial`

fits bivariate surfaces and `brandom`

defines random effects base-learners. In combination with option
`by`

, these base-learners can be turned into varying coefficient
terms. The linear base-learners are fitted using Ridge Regression
where the penalty parameter `lambda`

is either computed from
`df`

(default for `bbs`

, `bspatial`

, and
`brandom`

) or specified directly (`lambda = 0`

means no
penalization as default for `bols`

).

In `bols(x)`

, `x`

may be a numeric vector or factor.
Alternatively, `x`

can be a data frame containing numeric or
factor variables. In this case, or when multiple predictor variables
are specified, e.g., using `bols(x1, x2)`

, the model is
equivalent to `lm(y ~ ., data = x)`

or `lm(y ~ x1 + x2)`

,
respectively. By default, an intercept term is added to the
corresponding design matrix (which can be omitted using
`intercept = FALSE`

). It is *strongly* advised to (mean-)
center continuous covariates, if no intercept is used in `bols`

(see Hofner et al., 2011a). If `x`

is a matrix, it is directly used
as the design matrix and no further preprocessing (such as addition of
an intercept) is conducted. When `df`

(or `lambda`

) is
given, a ridge estimator with `df`

degrees of freedom (see
section ‘Global Options’) is used as base-learner. Note that
all variables are treated as a group, i.e., they enter the model
together if the corresponding base-learner is selected. For ordinal
variables, a ridge penalty for the differences of the adjacent
categories (Gertheiss and Tutz 2009, Hofner et al. 2011a) is applied.

With `bbs`

, the P-spline approach of Eilers and Marx (1996) is
used. P-splines use a squared *k*-th-order difference penalty
which can be interpreted as an approximation of the integrated squared
*k*-th derivative of the spline. In `bbs`

the argument
`knots`

specifies either the number of (equidistant) *interior*
knots to be used for the regression spline fit or a vector including
the positions of the *interior* knots. Additionally,
`boundary.knots`

can be specified. However, this is only advised
if one uses cyclic constraints, where the `boundary.knots`

specify the points where the function is joined (e.g.,
`boundary.knots = c(0, 2 * pi)`

for angles as in a sine function
or `boundary.knots = c(0, 24)`

for hours during the day). For
details on cylcic splines in the context of boosting see Hofner et
al. (2016).

`bspatial`

implements bivariate tensor product P-splines for the
estimation of either spatial effects or interaction surfaces. Note
that `bspatial(x, y)`

is equivalent to `bbs(x, y, df = 6)`

.
For possible arguments and defaults see there. The penalty term is
constructed based on bivariate extensions of the univariate penalties
in `x`

and `y`

directions, see Kneib, Hothorn and Tutz
(2009) for details. Note that the dimensions of the penalty matrix
increase (quickly) with the number of knots with strong impact on
computational time. Thus, both should not be chosen to large.
Different knots for `x`

and `y`

can be specified by a named
list.

`brandom(x)`

specifies a random effects base-learner based on a
factor variable `x`

that defines the grouping structure of the
data set. For each level of `x`

, a separate random intercept is
fitted, where the random effects variance is governed by the
specification of the degrees of freedom `df`

or `lambda`

(see section ‘Global Options’). Note that `brandom(...)`

is essentially a wrapper to ```
bols(..., df = 4, contrasts.arg =
"contr.dummy")
```

, i.e., a wrapper that utilizes ridge-penalized
categorical effects. For possible arguments and defaults see `bols`

.

For all linear base-learners the amount of smoothing is determined by
the trace of the hat matrix, as indicated by `df`

.

If `by`

is specified as an additional argument, a varying
coefficients term is estimated, where `by`

is the interaction
variable and the effect modifier is given by either `x`

or
`x`

and `y`

(specified via `...`

). If `bbs`

is
used, this corresponds to the classical situation of varying
coefficients, where the effect of `by`

varies over the co-domain
of `x`

. In case of `bspatial`

as base-learner, the effect of
`by`

varies with respect to both `x`

and `y`

, i.e. an
interaction surface between `x`

and `y`

is specified as
effect modifier. For `brandom`

specification of `by`

leads
to the estimation of random slopes for covariate `by`

with
grouping structure defined by factor `x`

instead of a simple
random intercept. In `bbs`

, `bspatial`

and `brandom`

the computation of the smoothing parameter `lambda`

for given
`df`

, or vice versa, might become (numerically) instable if the
values of the interaction variable `by`

become too large. In this
case, we recommend to rescale the interaction covariate e.g. by
dividing by `max(abs(by))`

. If `bbs`

or `bspatial`

is
specified with an factor variable `by`

with more than two
factors, the degrees of freedom are shared for the complete
base-learner (i.e., spread over all factor levels). Note that the null
space (see next paragraph) increases, as a separate null space for
each factor level is present. Thus, the minimum degrees of freedom
increase with increasing number of levels of `by`

(if
`center = FALSE`

).

For `bbs`

and `bspatial`

, option `center != FALSE`

requests that
the fitted effect is centered around its parametric, unpenalized part
(the so called null space). For example, with second order difference
penalty, a linear effect of `x`

remains unpenalized by `bbs`

and therefore the degrees of freedom for the base-learner have to be
larger than two. To avoid this restriction, option ```
center =
TRUE
```

subtracts the unpenalized linear effect from the fit, allowing
to specify any positive number as `df`

. Note that in this case
the linear effect `x`

should generally be specified as an
additional base-learner `bols(x)`

. For `bspatial`

and, for
example, second order differences, a linear effect of `x`

(`bols(x)`

), a linear effect of `y`

(`bols(y)`

), and
their interaction (`bols(x*y)`

) are subtracted from the effect
and have to be added separately to the model equation. More details on
centering can be found in Kneib, Hothorn and Tutz (2009) and Fahrmeir,
Kneib and Lang (2004). We strongly recommend to consult the latter reference
before using this option.

`brad(x)`

specifies penalized radial basis functions as used in
Kriging. If `knots`

is used to specify the number of knots, the
function `cover.design`

is used to specify the
location of the knots such that they minimize a geometric
space-filling criterion. Furthermore, knots can be specified directly
via a matrix. The `cov.function`

allows to specify the
radial basis functions. Per default, the flexible Matern correlation
function is used. This is specified using ```
cov.function =
stationary.cov
```

with `Covariance = "Matern"`

specified via
`args`

. If an effective range `theta`

is applicable for the
correlation function (e.g., the Matern family) the user can specify
this value. Per default (if `theta = NULL`

) the effective range is
chosen as \(\theta = max(||x_i - x_j||)/c\) such that the correlation function
$$\rho(c; \theta = 1) = \varepsilon,$$
where \(\varepsilon = 0.001\).

`bmrf`

builds a base of a Markov random field consisting of
several regions with a neighborhood structure. The input variable is
the observed region. The penalty matrix is either construed from a
boundary object or must be given directly via the option `bnd`

.
In that case the `dimnames`

of the matrix have to be the region
names, on the diagonal the number of neighbors have to be given for
each region, and for each neighborhood relation the value in the
matrix has to be -1, else 0. With a boundary object at hand, the
fitted or predicted values can be directly plotted into the map using
`drawmap`

.

`bkernel`

can be used to fit linear (`kernel = "lin"`

),
size-adjusted (`kernel = "sia"`

) or network (`kernel = "net"`

)
kernels based on genetic pathways for genome-wide assosiation studies.
For details see Friedrichs et al. (2017) and check the associated package
kangar00.

`buser(X, K)`

specifies a base-learner with user-specified design
matrix `X`

and penalty matrix `K`

, where `X`

and
`K`

are used to minimize a (penalized) least squares
criterion with quadratic penalty. This can be used to easily specify
base-learners that are not implemented (yet). See examples
below for details how `buser`

can be used to mimic existing
base-learners. Note that for predictions you need to set up the
design matrix for the new data manually.

For a categorical covariate with non-observed categories
`bols(x)`

and `brandom(x)`

both assign a zero effect
to these categories. However, the non-observed categories must be
listed in `levels(x)`

. Thus, predictions are possible
for new observations if they correspond to this category.

By default, all linear base-learners include an intercept term (which can
be removed using `intercept = FALSE`

for `bols`

). In this case,
the respective covariate should be mean centered (if continuous) and an
explicit global intercept term should be added to `gamboost`

via `bols`

(see example below). With `bols(x, intercept = FALSE)`

with categorical covariate `x`

a separate effect for each group
(mean effect) is estimated (see examples for resulting design matrices).

Smooth estimates with constraints can be computed using the
base-learner `bmono()`

which specifies P-spline base-learners
with an additional asymmetric penalty enforcing monotonicity or
convexity/concavity (see and Eilers, 2005). For more details in the
boosting context and monotonic effects of ordinal factors see Hofner,
Mueller and Hothorn (2011b). The quadratic-programming based algorithm
is described in Hofner et al. (2016). Alternative monotonicity
constraints are implemented via T-splines in `bbs()`

(Beliakov,
2000). In general it is advisable to use `bmono`

to fit monotonic splines
as T-splines show undesirable behaviour if the observed data deviates
from monotonicty.

Two or more linear base-learners can be joined using `%+%`

. A
tensor product of two or more linear base-learners is returned by
`%X%`

. When the design matrix can be written as the Kronecker
product of two matrices `X = kronecker(X2, X1)`

, then ```
bl1
%O% bl2
```

with design matrices X1 and X2, respectively, can be used
to efficiently compute Ridge-estimates following Currie, Durban,
Eilers (2006). In all cases the overall degrees of freedom of the
combined base-learner increase (additive or multiplicative,
respectively). These three features are experimental and for expert
use only.

`btree`

fits a stump to one or more variables. Note that
`blackboost`

is more efficient for boosting stumps. For
further references see Hothorn, Hornik, Zeileis (2006) and Hothorn et
al. (2010).

Note that the base-learners `bns`

and `bss`

are deprecated
(and no longer available). Please use `bbs`

instead, which
results in qualitatively the same models but is computationally much
more attractive.

##### Value

An object of class `blg`

(base-learner generator) with a
`dpp`

function.

The call of `dpp`

returns an object of class
`bl`

(base-learner) with a `fit`

function. The call to
`fit`

finally returns an object of class `bm`

(base-model).

##### Global Options

Three global options affect the base-learners:

`options("mboost_useMatrix")`

defaulting to

`TRUE`

indicates that the base-learner may use sparse matrix techniques for its computations. This reduces the memory consumption but might (for smaller sample sizes) require more computing time.`options("mboost_indexmin")`

is an integer that specifies the minimum sample size needed to optimize model fitting by automatically taking ties into account (default = 10000).

`options("mboost_dftraceS")`

`FALSE`

by default, indicating how the degrees of freedom should be computed. Per default $$\mathrm{df}(\lambda) = \mathrm{trace}(2S - S^{\top}S),$$ with smoother matrix \(S = X(X^{\top}X + \lambda K)^{-1} X\) is used (see Hofner et al., 2011a). If`TRUE`

, the trace of the smoother matrix \(\mathrm{df}(\lambda) = \mathrm{trace}(S)\) is used as degrees of freedom.Note that these formulae specify the relation of

`df`

and`lambda`

as the smoother matrix \(S\) depends only on \(\lambda\) (and the (fixed) design matrix \(X\), the (fixed) penalty matrix \(K\)).

##### References

Iain D. Currie, Maria Durban, and Paul H. C. Eilers (2006),
Generalized linear array models with applications to
multidimensional smoothing. *Journal of the Royal
Statistical Society, Series B--Statistical Methodology*,
**68**(2), 259--280.

Paul H. C. Eilers (2005), Unimodal smoothing. *Journal of
Chemometrics*, **19**, 317--328.

Paul H. C. Eilers and Brian D. Marx (1996), Flexible smoothing with B-splines
and penalties. *Statistical Science*, **11**(2), 89-121.

Ludwig Fahrmeir, Thomas Kneib and Stefan Lang (2004), Penalized structured
additive regression for space-time data: a Bayesian perspective.
*Statistica Sinica*, **14**, 731-761.

Jan Gertheiss and Gerhard Tutz (2009), Penalized regression with ordinal
predictors, *International Statistical Review*, **77**(3), 345--365.

D. Goldfarb and A. Idnani (1982), Dual and Primal-Dual Methods for Solving Strictly Convex Quadratic Programs. In J. P. Hennart (ed.), Numerical Analysis, Springer-Verlag, Berlin, pp. 226-239.

D. Goldfarb and A. Idnani (1983), A numerically stable dual
method for solving strictly convex quadratic programs.
*Mathematical Programming*, **27**, 1--33.

S. Friedrichs, J. Manitz, P. Burger, C.I. Amos, A. Risch, J.C. Chang-Claude,
H.E. Wichmann, T. Kneib, H. Bickeboeller, and B. Hofner (2017),
Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies.
*Computational and Mathematical Methods in Medicine*. 2017(6742763), 1-17.
10.1155/2017/6742763.

Benjamin Hofner, Torsten Hothorn, Thomas Kneib, and Matthias Schmid (2011a),
A framework for unbiased model selection based on boosting.
*Journal of Computational and Graphical Statistics*, **20**, 956--971.

Benjamin Hofner, Joerg Mueller, and Torsten Hothorn (2011b),
Monotonicity-Constrained Species Distribution Models.
*Ecology*, **92**, 1895--1901.

Benjamin Hofner, Thomas Kneib and Torsten Hothorn (2016),
A Unified Framework of Constrained Regression.
*Statistics & Computing*, **26**, 1--14.

Thomas Kneib, Torsten Hothorn and Gerhard Tutz (2009), Variable
selection and model choice in geoadditive regression models,
*Biometrics*, **65**(2), 626--634.

Torsten Hothorn, Kurt Hornik, Achim Zeileis (2006), Unbiased recursive
partitioning: A conditional inference framework. *Journal of
Computational and Graphical Statistics*, **15**, 651--674.

Torsten Hothorn, Peter Buehlmann, Thomas Kneib, Matthias Schmid and
Benjamin Hofner (2010), Model-based Boosting 2.0, *Journal of
Machine Learning Research*, **11**, 2109--2113.

G. M. Beliakov (2000), Shape Preserving Approximation using Least Squares
Splines, *Approximation Theory and its Applications*,
**16**(4), 80--98.

##### See Also

##### Examples

```
# NOT RUN {
set.seed(290875)
n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n) + 0.25 * x1
x3 <- as.factor(sample(0:1, 100, replace = TRUE))
x4 <- gl(4, 25)
y <- 3 * sin(x1) + x2^2 + rnorm(n)
weights <- drop(rmultinom(1, n, rep.int(1, n) / n))
### set up base-learners
spline1 <- bbs(x1, knots = 20, df = 4)
extract(spline1, "design")[1:10, 1:10]
extract(spline1, "penalty")
knots.x2 <- quantile(x2, c(0.25, 0.5, 0.75))
spline2 <- bbs(x2, knots = knots.x2, df = 5)
ols3 <- bols(x3)
extract(ols3)
ols4 <- bols(x4)
### compute base-models
drop(ols3$dpp(weights)$fit(y)$model) ## same as:
coef(lm(y ~ x3, weights = weights))
drop(ols4$dpp(weights)$fit(y)$model) ## same as:
coef(lm(y ~ x4, weights = weights))
### fit model, component-wise
mod1 <- mboost_fit(list(spline1, spline2, ols3, ols4), y, weights)
### more convenient formula interface
mod2 <- mboost(y ~ bbs(x1, knots = 20, df = 4) +
bbs(x2, knots = knots.x2, df = 5) +
bols(x3) + bols(x4), weights = weights)
all.equal(coef(mod1), coef(mod2))
### grouped linear effects
# center x1 and x2 first
x1 <- scale(x1, center = TRUE, scale = FALSE)
x2 <- scale(x2, center = TRUE, scale = FALSE)
model <- gamboost(y ~ bols(x1, x2, intercept = FALSE) +
bols(x1, intercept = FALSE) +
bols(x2, intercept = FALSE),
control = boost_control(mstop = 50))
coef(model, which = 1) # one base-learner for x1 and x2
coef(model, which = 2:3) # two separate base-learners for x1 and x2
# zero because they were (not yet) selected.
### example for bspatial
x1 <- runif(250,-pi,pi)
x2 <- runif(250,-pi,pi)
y <- sin(x1) * sin(x2) + rnorm(250, sd = 0.4)
spline3 <- bspatial(x1, x2, knots = 12)
Xmat <- extract(spline3, "design")
## 12 inner knots + 4 boundary knots = 16 knots per direction
## THUS: 16 * 16 = 256 columns
dim(Xmat)
extract(spline3, "penalty")[1:10, 1:10]
## specify number of knots separately
form1 <- y ~ bspatial(x1, x2, knots = list(x1 = 12, x2 = 14))
## decompose spatial effect into parametric part and
## deviation with one df
form2 <- y ~ bols(x1) + bols(x2) + bols(x1, by = x2, intercept = FALSE) +
bspatial(x1, x2, knots = 12, center = TRUE, df = 1)
# }
# NOT RUN {
############################################################
## Do not run and check these examples automatically as
## they take some time
mod1 <- gamboost(form1)
plot(mod1)
mod2 <- gamboost(form2)
## automated plot function:
plot(mod2)
## plot sum of linear and smooth effects:
library(lattice)
df <- expand.grid(x1 = unique(x1), x2 = unique(x2))
df$pred <- predict(mod2, newdata = df)
levelplot(pred ~ x1 * x2, data = df)
## End(Not run and test)
# }
# NOT RUN {
## specify radial basis function base-learner for spatial effect
## and use data-adaptive effective range (theta = NULL, see 'args')
form3 <- y ~ brad(x1, x2)
## Now use different settings, e.g. 50 knots and theta fixed to 0.4
## (not really a good setting)
form4 <- y ~ brad(x1, x2, knots = 50, args = list(theta = 0.4))
# }
# NOT RUN {
############################################################
## Do not run and check these examples automatically as
## they take some time
mod3 <- gamboost(form3)
plot(mod3)
dim(extract(mod3, what = "design", which = "brad")[[1]])
knots <- attr(extract(mod3, what = "design", which = "brad")[[1]], "knots")
mod4 <- gamboost(form4)
dim(extract(mod4, what = "design", which = "brad")[[1]])
plot(mod4)
## End(Not run and test)
# }
# NOT RUN {
### random intercept
id <- factor(rep(1:10, each = 5))
raneff <- brandom(id)
extract(raneff, "design")
extract(raneff, "penalty")
## random intercept with non-observed category
set.seed(1907)
y <- rnorm(50, mean = rep(rnorm(10), each = 5), sd = 0.1)
plot(y ~ id)
# category 10 not observed
obs <- c(rep(1, 45), rep(0, 5))
model <- gamboost(y ~ brandom(id), weights = obs)
coef(model)
fitted(model)[46:50] # just the grand mean as usual for
# random effects models
### random slope
z <- runif(50)
raneff <- brandom(id, by = z)
extract(raneff, "design")
extract(raneff, "penalty")
### specify simple interaction model (with main effect)
n <- 210
x <- rnorm(n)
X <- model.matrix(~ x)
z <- gl(3, n/3)
Z <- model.matrix(~z)
beta <- list(c(0,1), c(-3,4), c(2, -4))
y <- rnorm(length(x), mean = (X * Z[,1]) %*% beta[[1]] +
(X * Z[,2]) %*% beta[[2]] +
(X * Z[,3]) %*% beta[[3]])
plot(y ~ x, col = z)
## specify main effect and interaction
mod_glm <- gamboost(y ~ bols(x) + bols(x, by = z),
control = boost_control(mstop = 100))
nd <- data.frame(x, z)
nd <- nd[order(x),]
nd$pred_glm <- predict(mod_glm, newdata = nd)
for (i in seq(along = levels(z)))
with(nd[nd$z == i,], lines(x, pred_glm, col = z))
mod_gam <- gamboost(y ~ bbs(x) + bbs(x, by = z, df = 8),
control = boost_control(mstop = 100))
nd$pred_gam <- predict(mod_gam, newdata = nd)
for (i in seq(along = levels(z)))
with(nd[nd$z == i,], lines(x, pred_gam, col = z, lty = "dashed"))
### convenience function for plotting
par(mfrow = c(1,3))
plot(mod_gam)
### remove intercept from base-learner
### and add explicit intercept to the model
tmpdata <- data.frame(x = 1:100, y = rnorm(1:100), int = rep(1, 100))
mod <- gamboost(y ~ bols(int, intercept = FALSE) +
bols(x, intercept = FALSE),
data = tmpdata,
control = boost_control(mstop = 1000))
cf <- unlist(coef(mod))
## add offset
cf[1] <- cf[1] + mod$offset
signif(cf, 3)
signif(coef(lm(y ~ x, data = tmpdata)), 3)
### much quicker and better with (mean-) centering
tmpdata$x_center <- tmpdata$x - mean(tmpdata$x)
mod_center <- gamboost(y ~ bols(int, intercept = FALSE) +
bols(x_center, intercept = FALSE),
data = tmpdata,
control = boost_control(mstop = 100))
cf_center <- unlist(coef(mod_center, which=1:2))
## due to the shift in x direction we need to subtract
## beta_1 * mean(x) to get the correct intercept
cf_center[1] <- cf_center[1] + mod_center$offset -
cf_center[2] * mean(tmpdata$x)
signif(cf_center, 3)
signif(coef(lm(y ~ x, data = tmpdata)), 3)
# }
# NOT RUN {
############################################################
## Do not run and check these examples automatically as
## they take some time
### large data set with ties
nunique <- 100
xindex <- sample(1:nunique, 1000000, replace = TRUE)
x <- runif(nunique)
y <- rnorm(length(xindex))
w <- rep.int(1, length(xindex))
### brute force computations
op <- options()
options(mboost_indexmin = Inf, mboost_useMatrix = FALSE)
## data pre-processing
b1 <- bbs(x[xindex])$dpp(w)
## model fitting
c1 <- b1$fit(y)$model
options(op)
### automatic search for ties, faster
b2 <- bbs(x[xindex])$dpp(w)
c2 <- b2$fit(y)$model
### manual specification of ties, even faster
b3 <- bbs(x, index = xindex)$dpp(w)
c3 <- b3$fit(y)$model
all.equal(c1, c2)
all.equal(c1, c3)
## End(Not run and test)
# }
# NOT RUN {
### cyclic P-splines
set.seed(781)
x <- runif(200, 0,(2*pi))
y <- rnorm(200, mean=sin(x), sd=0.2)
newX <- seq(0,2*pi, length=100)
### model without cyclic constraints
mod <- gamboost(y ~ bbs(x, knots = 20))
### model with cyclic constraints
mod_cyclic <- gamboost(y ~ bbs(x, cyclic=TRUE, knots = 20,
boundary.knots=c(0, 2*pi)))
par(mfrow = c(1,2))
plot(x,y, main="bbs (non-cyclic)", cex=0.5)
lines(newX, sin(newX), lty="dotted")
lines(newX + 2 * pi, sin(newX), lty="dashed")
lines(newX, predict(mod, data.frame(x = newX)),
col="red", lwd = 1.5)
lines(newX + 2 * pi, predict(mod, data.frame(x = newX)),
col="blue", lwd=1.5)
plot(x,y, main="bbs (cyclic)", cex=0.5)
lines(newX, sin(newX), lty="dotted")
lines(newX + 2 * pi, sin(newX), lty="dashed")
lines(newX, predict(mod_cyclic, data.frame(x = newX)),
col="red", lwd = 1.5)
lines(newX + 2 * pi, predict(mod_cyclic, data.frame(x = newX)),
col="blue", lwd = 1.5)
### use buser() to mimic p-spline base-learner:
set.seed(1907)
x <- rnorm(100)
y <- rnorm(100, mean = x^2, sd = 0.1)
mod1 <- gamboost(y ~ bbs(x))
## now extract design and penalty matrix
X <- extract(bbs(x), "design")
K <- extract(bbs(x), "penalty")
## use X and K in buser()
mod2 <- gamboost(y ~ buser(X, K))
max(abs(predict(mod1) - predict(mod2))) # same results
### use buser() to mimic penalized ordinal base-learner:
z <- as.ordered(sample(1:3, 100, replace=TRUE))
y <- rnorm(100, mean = as.numeric(z), sd = 0.1)
X <- extract(bols(z))
K <- extract(bols(z), "penalty")
index <- extract(bols(z), "index")
mod1 <- gamboost(y ~ buser(X, K, df = 1, index = index))
mod2 <- gamboost(y ~ bols(z, df = 1))
max(abs(predict(mod1) - predict(mod2))) # same results
### kronecker product for matrix-valued responses
data("volcano", package = "datasets")
layout(matrix(1:2, ncol = 2))
## estimate mean of image treating image as matrix
image(volcano, main = "data")
x1 <- 1:nrow(volcano)
x2 <- 1:ncol(volcano)
vol <- as.vector(volcano)
mod <- mboost(vol ~ bbs(x1, df = 3, knots = 10)%O%
bbs(x2, df = 3, knots = 10),
control = boost_control(nu = 0.25))
mod[250]
volf <- matrix(fitted(mod), nrow = nrow(volcano))
image(volf, main = "fitted")
# }
# NOT RUN {
############################################################
## Do not run and check these examples automatically as
## they take some time
## the old-fashioned way, a waste of space and time
x <- expand.grid(x1, x2)
modx <- mboost(vol ~ bbs(Var2, df = 3, knots = 10) %X%
bbs(Var1, df = 3, knots = 10), data = x,
control = boost_control(nu = 0.25))
modx[250]
max(abs(fitted(mod) - fitted(modx)))
## End(Not run and test)
# }
# NOT RUN {
### setting contrasts via contrasts.arg
x <- as.factor(sample(1:4, 100, replace = TRUE))
## compute base-learners with different reference categories
BL1 <- bols(x, contrasts.arg = contr.treatment(4, base = 1)) # default
BL2 <- bols(x, contrasts.arg = contr.treatment(4, base = 2))
## compute 'sum to zero contrasts' using character string
BL3 <- bols(x, contrasts.arg = "contr.sum")
## extract model matrices to check if it works
extract(BL1)
extract(BL2)
extract(BL3)
### setting contrasts using named lists in contrasts.arg
x2 <- as.factor(sample(1:4, 100, replace = TRUE))
BL4 <- bols(x, x2,
contrasts.arg = list(x = contr.treatment(4, base = 2),
x2 = "contr.helmert"))
extract(BL4)
### using special contrast: "contr.dummy":
BL5 <- bols(x, contrasts.arg = "contr.dummy")
extract(BL5)
# }
```

*Documentation reproduced from package mboost, version 2.9-1, License: GPL-2*