Learn R Programming

vcrpart (version 0.2-1)

fvcm: Bagging and random forests based on tvcm

Description

Bagging (Breiman, 1996) and random forest (Breiman, 2001) ensemble algorithms for tvcm.

Usage

fvcm(..., control = fvcm_control())

fvcolmm(..., family = cumulative(), control = fvcm_control())

fvcglm(..., family, control = fvcm_control())

fvcm_control(maxstep = 10, folds = folds_control("subsampling", 5), ptry = 1, ntry = 1, vtry = 5, alpha = 1.0, maxoverstep = Inf, ...)

Arguments

...
for fvcm, fvcolmm and fvcglm arguments to be passed to
control
a list of control parameters as produced by fvcm_control.
family
the model family, e.g., binomial or cumulative.
maxstep
integer. The maximum number of steps for when growing individual trees.
folds
a list of parameters to control the extraction of subsets, as created by folds_control.
ptry
positive numeric scalar. The number of vc terms to be randomly sampled as candidates in each iteration. If 0 < ptry < 1, ptry is interpreted as the relative number of
ntry
positive numeric, either a scalar or a vector of length equal the number of vc terms. The number(s) of nodes of each term to be randomly sampled as candidates in each iteration. If 0 < nt
vtry
positive numeric, either a scalar or a vector of length equal the number of vc terms. The number(s) of input variables of each term to be randomly sampled as candidates in each iteration. If
maxoverstep, alpha
These two parameters are merely specified to disable the default stopping rules for tvcm. See also tvcm_control for details.

Value

  • An object of class fvcm.

Details

Implements the bagging (Breiman, 1996) and random forests (Breiman, 2001) ensemble algorithms for tvcm. The method consist in growing multiple trees by using tvcm and aggregating the fitted coefficient functions. To enable bagging, use ptry = Inf, ntry = Inf and vtry = Inf in fvcm_control.

fvcolmm and fvcglm are convenience functions for whether a olmm or a glm model is fitted.

fvcm_control is a wrapper of tvcm_control and the arguments indicated specify modified defaults and parameters for randomizing split selections. Notice that, relative to tvcm_control, also the cv prune arguments are internally disabled. The default arguments for alpha and maxoverstep essentially disable the stopping rules of tvcm, where the argument maxstep (the number of iterations i.e. the maximum number of splits) fully controls the stopping. The three parameters ptry, ntry and vtry control the randomization for selecting the vc term, the node and the variable for splitting. The default of vtry = 5 is arbitrary. It should be adjusted in applications, e.g., to the number of partitioning variables (for each term) divided by 3, see Hastie et al. (2001).

References

Leo Breiman (1996). Bagging Predictors. Machine Learning, 123--140 Leo Breiman (2001). Random Forests. Machine Learning, 45(1), 5--32.

T. Hastie, R. Tibshirani, J. Friedman (2001), The elements of statistical learning, Springer.

See Also

fvcm-methods, tvcm, glm, olmm

Examples

Run this code
## ------------------------------------------------------------------- #
## Dummy example 1:
##
## Bagging 'tvcm' on the artificially generated data 'vcrpart_3'. The 
## true coefficient function is a sinus curve between -pi/2 and pi/2. 
## The parameters 'maxstep = 3' and 'K = 5' are chosen to restrict the 
## computations.
## ------------------------------------------------------------------- #

## simulated data
data(vcrpart_3)

## setting parameters
control <-
  fvcm_control(maxstep = 3, minsize = 10,
               folds = folds_control("subsampling", K = 5, 0.5, seed = 3))

## fitting the forest
model <- fvcm(y ~ vc(z1, by = x1), data = vcrpart_3, 
              family = gaussian(), control = control)

## plot the first two trees
plot(model, "coef", 1:2)

## plotting the partial dependency of the coefficient for 'x1'
plot(model, "partdep")

Run the code above in your browser using DataLab