evtree: Evolutionary Learning of Globally Optimal Trees

Description

Learning of globally optimal classification and regression trees by using evolutionary algorithms.

Usage

evtree(formula, data, subset, na.action, weights, control = evtree.control(...), ...)

Arguments

formula

a symbolic description of the model to be fit, no interactions should be used.

data, subset, na.action

arguments controlling formula processing via model.frame.

weights

optional integer vector of case weights.

control

a list of control arguments specified via evtree.control.

...

arguments passed to evtree.control.

Value

An object of class party.

Details

Globally optimal classification and regression trees are learned by using evolutionary algorithm. Roughly, the algorithm works as follows. First, a set of trees is initialized with random split rules in the root nodes. Second, mutation and crossover operators are applied to modify the trees' structure and the tests that are applied in the internal nodes. After each modification step a survivor selection mechanism selects the best candidate models for the next iteration. In this evolutionary process the mean quality of the population increases over time. The algorithm terminates when the quality of the best trees does not improve further, but not later than a maximum number of iterations specified by niterations in evtree.control.

More details on the algorithm are provided Grubinger et al. (2014) which is also provided as vignette("evtree", package = "evtree").

The resulting trees can be summarized and visualized by the print.constparty, and plot.constparty methods provided by the partykit package. Moreover, the predict.party method can be used to compute fitted responses, probabilities (for classification trees), and nodes.

References

Grubinger T, Zeileis A, Pfeiffer KP (2014). evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R. Journal of Statistical Software, 61(1), 1-29. http://www.jstatsoft.org/v61/i01/

Examples

Run this code

## regression
set.seed(1090)
airq <- subset(airquality, !is.na(Ozone) & complete.cases(airquality))
ev_air <- evtree(Ozone ~ ., data = airq)
ev_air
plot(ev_air)
mean((airq$Ozone - predict(ev_air))^2)

## classification
## (note that different equivalent "perfect" splits for the setosa species
## in the iris data may be found on different architectures/systems)
ev_iris <- evtree(Species ~ .,data = iris)
## IGNORE_RDIFF_BEGIN
ev_iris
## IGNORE_RDIFF_END
plot(ev_iris)
table(predict(ev_iris), iris$Species)
1 - mean(predict(ev_iris) == iris$Species)

Run the code above in your browser using DataLab