C5.0 is a classification model that is an extension of the C4.5
model of Quinlan (1993). It has tree- and rule-based versions that also
include boosting capabilities. C5_rules()
enables the version of the model
that uses a series of rules (see the examples below). To make a set of
rules, an initial C5.0 tree is created and flattened into rules. The rules
are pruned, simplified, and ordered. Rule sets are created within each
iteration of boosting.
The two main tuning parameters are the number of trees in the boosting
ensemble (trees
) and the number of samples required to continue splitting
when creating a tree (min_n
). There are no arguments to control the total
number of rules in the ensemble.
Note that C5_rules()
does not require that categorical predictors be
converted to numeric indicator values. Note that using parsnip::fit()
will
always create dummy variables so, if there is interest in keeping the
categorical predictors in their original format, parsnip::fit_xy()
would
be a better choice. When using the tune
package, using a recipe for
pre-processing enables more control over how such predictors are encoded
since recipes do not automatically create dummy variables.
Note that C5.0 has a tool for early stopping during boosting where less
iterations of boosting are performed than the number requested. C5_rules()
turns this feature off (although it can be re-enabled using
C50::C5.0Control()
).