rule_fit
General Interface for RuleFit Models
rule_fit()
is a way to generate a specification of a model
before fitting. The main arguments for the model are:
mtry
: The number of predictors that will be randomly sampled at each split when creating the tree models.trees
: The number of trees contained in the ensemble.min_n
: The minimum number of data points in a node that are required for the node to be split further.tree_depth
: The maximum depth of the tree (i.e. number of splits).learn_rate
: The rate at which the boosting algorithm adapts from iteration-to-iteration.loss_reduction
: The reduction in the loss function required to split further.sample_size
: The amount of data exposed to the fitting routine.
These arguments are converted to their specific names at the
time that the model is fit. Other options and argument can be
set using parsnip::set_engine()
. If left to their defaults
here (NULL
), the values are taken from the underlying model
functions. If parameters need to be modified, update()
can be used
in lieu of recreating the object from scratch.
Usage
rule_fit(
mode = "unknown",
mtry = NULL,
trees = NULL,
min_n = NULL,
tree_depth = NULL,
learn_rate = NULL,
loss_reduction = NULL,
sample_size = NULL,
penalty = NULL
)# S3 method for rule_fit
update(
object,
parameters = NULL,
mtry = NULL,
trees = NULL,
min_n = NULL,
tree_depth = NULL,
learn_rate = NULL,
loss_reduction = NULL,
sample_size = NULL,
penalty = NULL,
fresh = FALSE,
...
)
Arguments
- mode
A single character string for the type of model. Possible values for this model are "unknown", "regression", or "classification".
- mtry
An number for the number (or proportion) of predictors that will be randomly sampled at each split when creating the tree models.
- trees
An integer for the number of trees contained in the ensemble.
- min_n
An integer for the minimum number of data points in a node that are required for the node to be split further.
- tree_depth
An integer for the maximum depth of the tree (i.e. number of splits).
- learn_rate
A number for the rate at which the boosting algorithm adapts from iteration-to-iteration.
- loss_reduction
A number for the reduction in the loss function required to split further .
- sample_size
An number for the number (or proportion) of data that is exposed to the fitting routine.
- penalty
L1 regularization parameter.
- object
A
rule_fit
model specification.- parameters
A 1-row tibble or named list with main parameters to update. If the individual arguments are used, these will supersede the values in parameters. Also, using engine arguments in this object will result in an error.
- fresh
A logical for whether the arguments should be modified in-place or replaced wholesale.
- ...
Not used for
update()
.
Details
The RuleFit model creates a regression model of rules in two stages. The first stage uses a tree-based model that is used to generate a set of rules that can be filtered, modified, and simplified. These rules are then added as predictors to a regularized generalized linear model that can also conduct feature selection during model training.
For the xrf
engine, the xgboost
package is used to create the rule set
that is then added to a glmnet
model.
The only available engine is "xrf"
.
Value
An updated parsnip
model specification.
References
Friedman, J. H., and Popescu, B. E. (2008). "Predictive learning via rule ensembles." The Annals ofApplied Statistics, 2(3), 916-954.
See Also
Examples
# NOT RUN {
rule_fit()
# Parameters can be represented by a placeholder:
rule_fit(trees = 7)
# ------------------------------------------------------------------------------
set.seed(6907)
rule_fit_rules <-
rule_fit(trees = 3, penalty = 0.1) %>%
set_mode("classification") %>%
fit(Species ~ ., data = iris)
# ------------------------------------------------------------------------------
model <- rule_fit(trees = 10, min_n = 2)
model
update(model, trees = 1)
update(model, trees = 1, fresh = TRUE)
# }