General Interface for C5.0 Rule-Based Classification Models

C5_rules() is a way to generate a specification of a model before fitting. The main arguments for the model are:

  • trees: The number of sequential models included in the ensemble (rules are derived from an initial set of boosted trees).

  • min_n: The minimum number of data points in a node that are required for the node to be split further.

These arguments are converted to their specific names at the time that the model is fit. Other options and argument can be set using parsnip::set_engine(). If left to their defaults here (NULL), the values are taken from the underlying model functions. If parameters need to be modified, update() can be used in lieu of recreating the object from scratch.

C5_rules(mode = "classification", trees = NULL, min_n = NULL)

# S3 method for C5_rules update( object, parameters = NULL, trees = NULL, min_n = NULL, fresh = FALSE, ... )


A single character string for the type of model. The only possible value for this model is "classification".


A non-negative integer (no greater than 100 for the number of members of the ensemble.


An integer greater than one zero and nine for the minimum number of data points in a node that are required for the node to be split further.


A C5_rules model specification.


A 1-row tibble or named list with main parameters to update. If the individual arguments are used, these will supersede the values in parameters. Also, using engine arguments in this object will result in an error.


A logical for whether the arguments should be modified in-place or replaced wholesale.


Not used for update().


C5.0 is a classification model that is an extension of the C4.5 model of Quinlan (1993). It has tree- and rule-based versions that also include boosting capabilities. C5_rules() enables the version of the model that uses a series of rules (see the examples below). To make a set of rules, an initial C5.0 tree is created and flattened into rules. The rules are pruned, simplified, and ordered. Rule sets are created within each iteration of boosting.

The two main tuning parameters are the number of trees in the boosting ensemble (trees) and the number of samples required to continue splitting when creating a tree (min_n). There are no arguments to control the total number of rules in the ensemble.

Note that C5_rules() does not require that categorical predictors be converted to numeric indicator values. Note that using parsnip::fit() will always create dummy variables so, if there is interest in keeping the categorical predictors in their original format, parsnip::fit_xy() would be a better choice. When using the tune package, using a recipe for pre-processing enables more control over how such predictors are encoded since recipes do not automatically create dummy variables.

Note that C5.0 has a tool for early stopping during boosting where less iterations of boosting are performed than the number requested. C5_rules() turns this feature off (although it can be re-enabled using C50::C5.0Control()).


An updated parsnip model specification.


Quinlan R (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers.

See Also

parsnip::fit(), parsnip::fit_xy(), C50::C5.0(), C50::C5.0Control()

  • C5_rules
  • update.C5_rules
# Parameters can be represented by a placeholder:
C5_rules(trees = 7)

# ------------------------------------------------------------------------------

data(ad_data, package = "modeldata")

class_rules <-
  C5_rules(trees = 1, min_n  = 10) %>%
  fit(Class ~ ., data = ad_data)


# ------------------------------------------------------------------------------

model <- C5_rules(trees = 10, min_n = 2)
update(model, trees = 1)
update(model, trees = 1, fresh = TRUE)
# }
Documentation reproduced from package rules, version 0.0.1, License: MIT + file LICENSE

Community examples

Looks like there are no examples yet.