R interfaces to Weka regression and classification tree learners.

```
J48(formula, data, subset, na.action,
control = Weka_control(), options = NULL)
LMT(formula, data, subset, na.action,
control = Weka_control(), options = NULL)
M5P(formula, data, subset, na.action,
control = Weka_control(), options = NULL)
DecisionStump(formula, data, subset, na.action,
control = Weka_control(), options = NULL)
```

formula

a symbolic description of the model to be fit.

data

an optional data frame containing the variables in the model.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when
the data contain `NA`

s. See `model.frame`

for
details.

control

an object of class `Weka_control`

giving
options to be passed to the Weka learner. Available options can be
obtained on-line using the Weka Option Wizard `WOW`

, or
the Weka documentation.

options

a named list of further options, or `NULL`

(default). See **Details**.

A list inheriting from classes `Weka_tree`

and
`Weka_classifiers`

with components including

a reference (of class
`jobjRef`

) to a Java object
obtained by applying the Weka `buildClassifier`

method to build
the specified model using the given control options.

a numeric vector or factor with the model
predictions for the training instances (the results of calling the
Weka `classifyInstance`

method for the built classifier and
each instance).

the matched call.

There are a `predict`

method for
predicting from the fitted models, and a `summary`

method based
on `evaluate_Weka_classifier`

.

There is also a `plot`

method for fitted binary `Weka_tree`

s
via the facilities provided by package partykit. This converts
the `Weka_tree`

to a `party`

object and then simply calls
the plot method of this class (see `plot.party`

).

Provided the Weka classification tree learner implements the
“Drawable” interface (i.e., provides a `graph`

method),
`write_to_dot`

can be used to create a DOT representation
of the tree for visualization via Graphviz or the Rgraphviz
package.

`J48`

generates unpruned or pruned C4.5 decision trees (Quinlan,
1993).

`LMT`

implements “Logistic Model Trees” (Landwehr, 2003;
Landwehr et al., 2005).

`M5P`

(where the `P` stands for ‘prime’) generates M5
model trees using the M5' algorithm, which was introduced in Wang &
Witten (1997) and enhances the original M5 algorithm by Quinlan
(1992).

`DecisionStump`

implements decision stumps (trees with a single
split only), which are frequently used as base learners for meta
learners such as Boosting.

The model formulae should only use the `+` and `-` operators
to indicate the variables to be included or not used, respectively.

Argument `options`

allows further customization. Currently,
options `model`

and `instances`

(or partial matches for
these) are used: if set to `TRUE`

, the model frame or the
corresponding Weka instances, respectively, are included in the fitted
model object, possibly speeding up subsequent computations on the
object. By default, neither is included.

`parse_Weka_digraph`

can parse the graph associated with a Weka
tree classifier (and obtained by invoking its `graph()`

method in
Weka), returning a simple list with nodes and edges.

N. Landwehr (2003).
*Logistic Model Trees*.
Master's thesis, Institute for Computer Science, University of
Freiburg, Germany.
https://www.cs.uni-potsdam.de/ml/landwehr/diploma_thesis.pdf

N. Landwehr, M. Hall, and E. Frank (2005).
Logistic Model Trees.
*Machine Learning*, **59**, 161--205.
10.1007/s10994-005-0466-3.

R. Quinlan (1993).
*C4.5: Programs for Machine Learning*.
Morgan Kaufmann Publishers, San Mateo, CA.

R. Quinlan (1992).
Learning with continuous classes.
*Proceedings of the Australian Joint Conference on Artificial
Intelligence*, 343--348.
World Scientific, Singapore.

Y. Wang and I. H. Witten (1997).
Induction of model trees for predicting continuous classes.
*Proceedings of the European Conference on Machine
Learning*.
University of Economics, Faculty of Informatics and Statistics,
Prague.

I. H. Witten and E. Frank (2005).
*Data Mining: Practical Machine Learning Tools and Techniques*.
2nd Edition, Morgan Kaufmann, San Francisco.

# NOT RUN { m1 <- J48(Species ~ ., data = iris) ## print and summary m1 summary(m1) # calls evaluate_Weka_classifier() table(iris$Species, predict(m1)) # by hand ## visualization ## use partykit package if(require("partykit", quietly = TRUE)) plot(m1) ## or Graphviz write_to_dot(m1) ## or Rgraphviz # } # NOT RUN { library("Rgraphviz") ff <- tempfile() write_to_dot(m1, ff) plot(agread(ff)) # } # NOT RUN { ## Using some Weka data sets ... ## J48 DF2 <- read.arff(system.file("arff", "contact-lenses.arff", package = "RWeka")) m2 <- J48(`contact-lenses` ~ ., data = DF2) m2 table(DF2$`contact-lenses`, predict(m2)) if(require("partykit", quietly = TRUE)) plot(m2) ## M5P DF3 <- read.arff(system.file("arff", "cpu.arff", package = "RWeka")) m3 <- M5P(class ~ ., data = DF3) m3 if(require("partykit", quietly = TRUE)) plot(m3) ## Logistic Model Tree. DF4 <- read.arff(system.file("arff", "weather.arff", package = "RWeka")) m4 <- LMT(play ~ ., data = DF4) m4 table(DF4$play, predict(m4)) ## Larger scale example. if(require("mlbench", quietly = TRUE) && require("partykit", quietly = TRUE)) { ## Predict diabetes status for Pima Indian women data("PimaIndiansDiabetes", package = "mlbench") ## Fit J48 tree with reduced error pruning m5 <- J48(diabetes ~ ., data = PimaIndiansDiabetes, control = Weka_control(R = TRUE)) plot(m5) ## (Make sure that the plotting device is big enough for the tree.) } # }