# cubist_rules

##### General Interface for Cubist Rule-Based Regression Models

`cubist_rules()`

is a way to generate a *specification* of a model
before fitting. The main arguments for the model are:

`committees`

: The number of sequential models included in the ensemble (similar to the number of trees in boosting).`neighbors`

: The number of neighbors in the post-model instance-based adjustment.

These arguments are converted to their specific names at the
time that the model is fit. Other options and argument can be
set using `parsnip::set_engine()`

. If left to their defaults
here (`NULL`

), the values are taken from the underlying model
functions. If parameters need to be modified, `update()`

can be used
in lieu of recreating the object from scratch.

##### Usage

```
cubist_rules(
mode = "regression",
committees = NULL,
neighbors = NULL,
max_rules = NULL
)
```# S3 method for cubist_rules
update(
object,
parameters = NULL,
committees = NULL,
neighbors = NULL,
max_rules = NULL,
fresh = FALSE,
...
)

##### Arguments

- mode
A single character string for the type of model. The only possible value for this model is "regression".

- committees
A non-negative integer (no greater than 100 for the number of members of the ensemble.

- neighbors
An integer between zero and nine for the number of training set instances that are used to adjust the model-based prediction.

- max_rules
The largest number of rules.

- object
A Cubist model specification.

- parameters
A 1-row tibble or named list with

*main*parameters to update. If the individual arguments are used, these will supersede the values in parameters. Also, using engine arguments in this object will result in an error.- fresh
A logical for whether the arguments should be modified in-place or replaced wholesale.

- ...
Not used for

`update()`

.

##### Details

Cubist is a rule-based ensemble regression model. A basic model tree
(Quinlan, 1992) is created that has a separate linear regression model
corresponding for each terminal node. The paths along the model tree is
flattened into rules these rules are simplified and pruned. The parameter
`min_n`

is the primary method for controlling the size of each tree while
`max_rules`

controls the number of rules.

Cubist ensembles are created using *committees*, which are similar to
boosting. After the first model in the committee is created, the second
model uses a modified version of the outcome data based on whether the
previous model under- or over-predicted the outcome. For iteration *m*, the
new outcome `y*`

is computed using

If a sample is under-predicted on the previous iteration, the outcome is adjusted so that the next time it is more likely to be over-predicted to compensate. This adjustment continues for each ensemble iteration. See Kuhn and Johnson (2013) for details.

After the model is created, there is also an option for a post-hoc
adjustment that uses the training set (Quinlan, 1993). When a new sample is
predicted by the model, it can be modified by its nearest neighbors in the
original training set. For *K* neighbors, the model based predicted value is
adjusted by the neighbor using:

where `t`

is the training set prediction and `w`

is a weight that is inverse
to the distance to the neighbor.

Note that `cubist_rules()`

does not require that categorical predictors be
converted to numeric indicator values. Note that using `parsnip::fit()`

will
*always* create dummy variables so, if there is interest in keeping the
categorical predictors in their original format, `parsnip::fit_xy()`

would
be a better choice. When using the `tune`

package, using a recipe for
pre-processing enables more control over how such predictors are encoded
since recipes do not automatically create dummy variables.

The only available engine is `"Cubist"`

.

##### Value

An updated `parsnip`

model specification.

##### References

Quinlan R (1992). "Learning with Continuous Classes." Proceedings of the 5th Australian Joint Conference On Artificial Intelligence, pp. 343-348.

Quinlan R (1993)."Combining Instance-Based and Model-Based Learning." Proceedings of the Tenth International Conference on Machine Learning, pp. 236-243.

Kuhn M and Johnson K (2013). *Applied Predictive Modeling*. Springer.

##### See Also

`parsnip::fit()`

, `parsnip::fit_xy()`

, `Cubist::cubist()`

,
`Cubist::cubistControl()`

##### Examples

```
# NOT RUN {
cubist_rules()
# Parameters can be represented by a placeholder:
cubist_rules(committees = 7)
# ------------------------------------------------------------------------------
data(car_prices, package = "modeldata")
car_rules <-
cubist_rules(committees = 1) %>%
fit(log10(Price) ~ ., data = car_prices)
car_rules
summary(car_rules$fit)
# ------------------------------------------------------------------------------
model <- cubist_rules(committees = 10, neighbors = 2)
model
update(model, committees = 1)
update(model, committees = 1, fresh = TRUE)
# }
```

*Documentation reproduced from package rules, version 0.0.1, License: MIT + file LICENSE*