pre (version 0.7.2)

maxdepth_sampler: Sampling function generator for specifyinf varying maximum tree depth in a prediction rule ensemble (pre)

Description

maxdepth_sampler generates a random sampling function, governed by a pre-specified average tree depth.

Usage

maxdepth_sampler(av.no.term.nodes = 4L, av.tree.depth = NULL)

Arguments

av.no.term.nodes

integer of length one. Specifies the average number of terminal nodes in trees used for rule inducation.

av.tree.depth

integer of length one. Specifies the average maximum tree depth in trees used for rule induction.

Value

Returns a random sampling function with single argument 'ntrees', which can be supplied to the maxdepth argument of function pre to specify varying tree depths.

Details

The original RuleFit implementation varying tree sizes for rule induction. Furthermore, it defined tree size in terms of the number of terminal nodes. In contrast, function pre defines the maximum tree size in terms of a (constant) tree depth. Function maxdepth_sampler allows for mimicing the behavior of the orignal RuleFit implementation. In effect, the maximum tree depth is sampled from an exponential distribution with learning rate \(1/(\bar{L}-2)\), where \(\bar{L} \ge 2\) represents the average number of terminal nodes for trees in the ensemble. See Friedman & Popescu (2008, section 3.3).

References

Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916-954.

See Also

pre

Examples

Run this code
# NOT RUN {
## RuleFit default is max. 4 terminal nodes, on average:
func1 <- maxdepth_sampler()
set.seed(42)
func1(10)
mean(func1(1000))

## Max. 16 terminal nodes, on average (equals average maxdepth of 4):
func2 <- maxdepth_sampler(av.no.term.nodes = 16L)
set.seed(42)
func2(10)
mean(func2(1000))

## Max. tree depth of 3, on average:
func3 <- maxdepth_sampler(av.tree.depth = 3)
set.seed(42)
func3(10)
mean(func3(1000))

## Max. 2 of terminal nodes, on average (always yields maxdepth of 1):
func4 <- maxdepth_sampler(av.no.term.nodes = 2L)
set.seed(42)
func4(10)
mean(func4(1000))

# }
# NOT RUN {
## Create rule ensemble with varying maxdepth:
set.seed(42)
airq.ens <- pre(Ozone ~ ., data = airquality[complete.cases(airquality),],
                maxdepth = func1)
airq.ens
# }

Run the code above in your browser using DataLab