fvcm: Bagging and Random Forests based on `tvcm`

Description

Bagging (Breiman, 1996) and Random Forest (Breiman, 2001) ensemble algorithms for tvcm.

Usage

fvcm(..., control = fvcm_control())
fvcm_control(maxstep = 10, folds = folds_control("subsampling", K = 100), mtry = 5, alpha = 1.0, mindev = 0.0, verbose = TRUE, ...)
fvcolmm(..., family = cumulative(), control = fvcolmm_control())
fvcolmm_control(maxstep = 10, folds = folds_control("subsampling", K = 100), mtry = 5, alpha = 1.0, minsize = 50,  nimpute = 1, verbose = TRUE, ...)
fvcglm(..., family, control = fvcglm_control())
fvcglm_control(maxstep = 10, folds = folds_control("subsampling", K = 100), mtry = 5, mindev = 0, verbose = TRUE, ...)

Arguments

...

for fvcm, fvcolmm and fvcglm arguments to be passed to tvcm. This includes at least the arguments formula, data and family, see examples below. For fvcm_control further control arguments to be passed to tvcm_control.

control

a list of control parameters as produced by fvcm_control.

family

the model family, e.g., binomial or cumulative.

maxstep

integer. The maximum number of steps for when growing individual trees.

folds

a list of parameters to control the extraction of subsets, as created by folds_control.

mtry

positive integer scalar. The number of combinations of partitions, nodes and variables to be randomly sampled as candidates in each iteration.

mindev, alpha

these parameters are merely specified to disable the default stopping rules for tvcm. See also tvcm_control for details.

minsize, nimpute

special parameter settings for fvcolmm. The minimum node size is set to the default of tvcolmm. The default nimpute deactivates the imputation procedure in cases of unbalanced data.

verbose

logical. Should information about the fitting process be printed to the screen?

Value

fvcm.

Details

Implements the Bagging (Breiman, 1996) and Random Forests (Breiman, 2001) ensemble algorithms for tvcm. The method consist in growing multiple trees by using tvcm and aggregating the fitted coefficient functions in the scale of the predictor function. To enable bagging, use mtry = Inf in fvcm_control.

fvcolmm and fvcglm are the extensions for tvcolmm and tvcglm.

fvcm_control is a wrapper of tvcm_control and the arguments indicated specify modified defaults and parameters for randomizing split selections. Notice that, relative to tvcm_control, also the cv prune arguments are internally disabled. The default arguments for alpha and maxoverstep essentially disable the stopping rules of tvcm, where the argument maxstep (the number of iterations i.e. the maximum number of splits) fully controls the stopping. The parameter mtry controls the randomization for selecting combinations of partitions, nodes and variables for splitting. The default of mtry = 5 is arbitrary.

References

Breiman, L. (1996). Bagging Predictors. Machine Learning, 24(2), 123--140. Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5--32.

Hastie, T., R. Tibshirani and J. Friedman (2001). The Elements of Statistical Learning (2 ed.). New York, USA: Springer-Verlag.

Examples

Run this code

## ------------------------------------------------------------------- #
## Dummy example 1:
##
## Bagging 'tvcm' on the artificially generated data 'vcrpart_3'. The 
## true coefficient function is a sinus curve between -pi/2 and pi/2. 
## The parameters 'maxstep = 3' and 'K = 5' are chosen to restrict the 
## computations.
## ------------------------------------------------------------------- #

## simulated data
data(vcrpart_3)

## setting parameters
control <-
  fvcm_control(maxstep = 3, minsize = 10,
               folds = folds_control("subsampling", K = 5, 0.5, seed = 3))

## fitting the forest
model <- fvcm(y ~ vc(z1, by = x1), data = vcrpart_3, 
              family = gaussian(), control = control)

## plot the first two trees
plot(model, "coef", 1:2)

## plotting the partial dependency of the coefficient for 'x1'
plot(model, "partdep")

Run the code above in your browser using DataLab