Base-learners for Gradient Boosting
Base-learners for fitting base-models in the generic implementation of
component-wise gradient boosting in function
bols(..., by = NULL, index = NULL, intercept = TRUE, df = NULL, lambda = 0, contrasts.arg = "contr.treatment") bbs(..., by = NULL, index = NULL, knots = 20, degree = 3, differences = 2, df = 4, lambda = NULL, center = FALSE) bspatial(...) brandom(..., df = 4) btree(..., tree_controls = ctree_control(stump = TRUE, mincriterion = 0)) bl1 %+% bl2 bl1 %X% bl2
- one or more predictor variables or one data frame of predictor variables.
- an optional variable defining varying coefficients, either a binary or numeric variable.
- a vector of integers for expanding the variables in
.... For example,
bols(x, index = index)is equal to
indexis an integer of length greater or e
- trace of the hat matrix for the base-learner defining the base-learner
complexity. Low values of
dfcorrespond to a large amount of smoothing and thus to "weaker" base-learners. Certain restrictions have to be kept
- smoothing penalty, computed from
- either the number of (equidistant) interior knots to be used for
the regression spline fit or a vector including the positions of the interior
knots. For multiple predictor variables,
knotsmay be a named
- degree of the regression spline.
- 1, 2, or 3. If
differences= k, k-th-order differences are used as a penalty.
intercept=TRUEan intercept is added to the design matrix of a linear base-learner.
center=TRUEthe corresponding effect is re-parameterized such that the unpenalized part of the fit is subtracted and only the deviation effect is fitted. The unpenalized, parametric part has then to
- a character suitable for input to the
contrastsreplacement function, see
- an object of class
", which can be obtained using
ctree_control. Defines hyper-parameters for the trees w
- a linear base-learner or a list of linear base-learners.
- a linear base-learner or a list of linear base-learners.
bols refers to linear base-learners (potentially estimated with a ridge penalty), while
bbs provide penalized regression splines.
fits bivariate surfaces and
brandom defines random effects base-learners.
In combination with option
by, these base-learners can be turned into varying
coefficient terms. The linear base-learners are fitted using Ridge Regression
where the penalty parameter
lambda is either computed from
brandom) or specified directly
lambda = 0 means no penalization as default for
x may be a numeric vector or factor. Alternatively,
x can be a data frame containing numeric or factor variables.
In this case, or when multiple predictor variables are specified, e.g.,
bols(x1, x2), the model is equivalent to
lm(y ~ ., data = x)
lm(y ~ x1 + x2), respectively.
By default, an intercept term is added to the corresponding design matrix
(which can be omitted using
intercept = FALSE). When
given, a ridge estimator with
df degrees of freedom (trace of hat matrix)
is used as base-learner. Note that all variables are treated as a group,
i.e., they enter the model together if the corresponding base-learner is selected.
bbs, the P-spline approach of Eilers and Marx (1996) is
used. P-splines use a squared k-th-order difference penalty
which can be interpreted as an approximation of the integrated squared
k-th derivative of the spline.
bspatial implements bivariate tensor product P-splines for the
estimation of either spatial effects or interaction surfaces. Note
bspatial(x, y) is equivalent to
bbs(x, y). For
possible arguments and defaults see there.
The penalty term is constructed based on bivariate extensions of the
univariate penalties in
y directions, see Kneib,
Hothorn and Tutz (2009) for details. Note that the dimensions of the
penalty matrix increase (quickly) with the number of knots with strong
impact on computational time. Thus, both should not be chosen to
large. Different knots for
y can be specified
by a named list.
brandom(x) specifies a random effects base-learner based on a
x that defines the grouping structure of the
data set. For each level of
x, a separate random intercept is
fitted, where the random effects variance is governed by the
specification of the degrees of freedom
For all linear base-learners the amount of smoothing is determined by the
trace of the hat matrix, as indicated by
bols a ridge penalty with the according degrees of
freedom is used. For ordinal variables, a ridge penalty for the
differences of the adjacent categories (Gertheiss and Tutz 2009) is applied.
by is specified as an additional argument, a
varying coefficients term is estimated, where
by is the
interaction variable and the effect modifier is given by either
y (specified via
bbs is used, this corresponds to the
classical situation of varying coefficients, where the effect of
by varies over the co-domain of
x. In case of
base-learner, the effect of
by varies with respect to both
y, i.e. an interaction surface between
y is specified as effect modifier. For
brandom specification of
leads to the estimation of random slopes for covariate
by with grouping structure
defined by factor
x instead of a simple random intercept.
center requests that the
fitted effect is centered around its parametric, unpenalized part. For
example, with second order difference penalty, a linear effect of
remains unpenalized by
bbs and therefore the degrees of freedom for the base-learner
have to be larger than two. To avoid this restriction, option
subtracts the unpenalized linear effect from the fit, allowing to specify any
positive number as
df. Note that in this case the linear effect
x should generally be specified as an additional base-learner
bspatial and, for example, second order
differences, a linear effect of
bols(x)), a linear effect of
bols(y)), and their interaction (
subtracted from the effect and have to be added separately to the model
equation. More details on centering can be found in Kneib, Hothorn and Tutz
(2009) and Fahrmeir, Kneib and Lang (2004).
For a categorical covariate with non-observed categories
brandom(x) both assign a zero effect
these categories. However, the non-observed categories must be
levels(x). Thus, predictions are possible
for new observations if they correspond to this category.
By default, all linear base-learners include an intercept term (which can
be removed using
intercept = FALSE for
center = TRUE for
bbs). In this case, an explicit global
intercept term should be added to
Three global options affect the base-learners:
TRUE indicates that the base-learner may use
sparse matrix techniques for its computations. This reduces the memory
consumption but might (for smaller sample sizes) require more computing
option("mboost_indexmin") is an integer for the sample
size required to optimize model fitting by taking ties into account.
option("mboost_dftraceS"), which is also
TRUE by default,
indicates that the trace of the smoother matrix is used as degrees
of freedom. If
FALSE, an alternative is used (see
Hofner et al., 2009).
Two or more linear base-learners can be joined using
%+%. A tensor product
of two or more linear base-learners is returned by
These two features are experimental and for expert use only.
btree fits a stump to one or more variables. Note that
blackboost is more efficient for boosting stumps.
- An object of class
bl(base-learner) with a
dppfunction. The call of
dppreturns an object of class
Paul H. C. Eilers and Brian D. Marx (1996), Flexible smoothing with B-splines
and penalties. Statistical Science, 11(2), 89-121.
Ludwig Fahrmeir, Thomas Kneib and Stefan Lang (2004), Penalized structured
additive regression for space-time data: a Bayesian perspective.
Statistica Sinica, 14, 731-761.
Jan Gertheiss and Gerhard Tutz (2009), Penalized regression with ordinal
predictors, International Statistical Review, 77(3), 345--365.
Benjamin Hofner, Torsten Hothorn, Thomas Kneib, and Matthias Schmid (2009),
A framework for unbiased model selection based on boosting.
Technical Report Nr. 72, Institut fuer Statistik, LMU Muenchen.
set.seed(290875) n <- 100 x1 <- rnorm(n) x2 <- rnorm(n) + 0.25 * x1 x3 <- as.factor(sample(0:1, 100, replace = TRUE)) x4 <- gl(4, 25) y <- 3 * sin(x1) + x2^2 + rnorm(n) weights <- drop(rmultinom(1, n, rep.int(1, n) / n)) ### set up base-learners spline1 <- bbs(x1, knots = 20, df = 4) attributes(spline1) knots.x2 <- quantile(x2, c(0.25, 0.5, 0.75)) spline2 <- bbs(x2, knots = knots.x2, df = 5) attributes(spline2) attributes(ols3 <- bols(x3)) attributes(ols4 <- bols(x4)) ### compute base-models drop(ols3$dpp(weights)$fit(y)$model) ## same as: coef(lm(y ~ x3, weights = weights)) drop(ols4$dpp(weights)$fit(y)$model) ## same as: coef(lm(y ~ x4, weights = weights)) ### fit model, component-wise mod1 <- mboost_fit(list(spline1, spline2, ols3, ols4), y, weights) ### more convenient formula interface mod2 <- mboost(y ~ bbs(x1, knots = 20, df = 4) + bbs(x2, knots = knots.x2, df = 5) + bols(x3) + bols(x4)) all.equal(coef(mod1), coef(mod2)) ### grouped linear effects model <- gamboost(y ~ bols(x1, x2, intercept = FALSE) + bols(x1, intercept = FALSE) + bols(x2, intercept = FALSE), control = boost_control(mstop = 400)) coef(model, which=1) # one base-learner for x1 and x2 coef(model, which=2:3) # two separate base-learners for x1 and x2 ### example for bspatial x1 <- runif(250,-pi,pi) x2 <- runif(250,-pi,pi) y <- sin(x1) * sin(x2) + rnorm(250, sd = 0.4) spline3 <- bspatial(x1, x2, knots=12) attributes(spline3) ## specify number of knots separately form2 <- y ~ bspatial(x1, x2, knots=list(x1=12, x2=12)) ## decompose spatial effect into parametric part and ## deviation with one df form2 <- y ~ bols(x1) + bols(x2) + bols(x1*x2) + bspatial(x1, x2, knots = 12, center = TRUE, df = 1) ### random intercept id <- factor(rep(1:10, each = 5)) raneff <- brandom(id) attributes(raneff) ## random intercept with non-observed category set.seed(1907) y <- rnorm(50, mean = rep(rnorm(10), each = 5), sd = 0.1) plot(y ~ id) # category 10 not observed obs <- c(rep(1, 45), rep(0, 5)) model <- gamboost(y ~ brandom(id), weights = obs) coef(model) fitted(model)[46:50] # just the grand mean as usual for # random effects models ### random slope z <- runif(50) raneff <- brandom(id, by=z) attributes(raneff) ### remove intercept from base-learner ### and add explicit intercept to the model tmpdata <- data.frame(x = 1:100, y = rnorm(1:100), int = rep(1, 100)) mod <- gamboost(y ~ bols(int, intercept = FALSE) + bols(x, intercept = FALSE), data = tmpdata, control = boost_control(mstop = 2500)) cf <- unlist(coef(mod)) cf <- cf + mod$offset cf coef(lm(y ~ x, data = tmpdata)) ### large data set with ties nunique <- 100 xindex <- sample(1:nunique, 1000000, replace = TRUE) x <- runif(nunique) y <- rnorm(length(xindex)) w <- rep.int(1, length(xindex)) ### brute force computations op <- options() options(mboost_indexmin = Inf, mboost_useMatrix = FALSE) ## data pre-processing b1 <- bbs(x[xindex])$dpp(w) ## model fitting c1 <- b1$fit(y)$model options(op) ### automatic search for ties, faster b2 <- bbs(x[xindex])$dpp(w) c2 <- b2$fit(y)$model ### manual specification of ties, even faster b3 <- bbs(x, index = xindex)$dpp(w) c3 <- b3$fit(y)$model all.equal(c1, c2) all.equal(c1, c3)