rand_forest
is a way to generate a specification of a model
before fitting and allows the model to be created using
different packages in R or via Spark. The main arguments for the
model are:
mtry
: The number of predictors that will be
randomly sampled at each split when creating the tree models.
trees
: The number of trees contained in the ensemble.
min_n
: The minimum number of data points in a node
that are required for the node to be split further.
These arguments are converted to their specific names at the
time that the model is fit. Other options and argument can be
set using the engine_args
argument. If left to their defaults
here (NULL
), the values are taken from the underlying model
functions.
rand_forest(mode = "unknown", mtry = NULL, trees = NULL, min_n = NULL,
engine_args = list(), ...)
A single character string for the type of model. Possible values for this model are "unknown", "regression", or "classification".
An integer for the number of predictors that will be randomly sampled at each split when creating the tree models.
An integer for the number of trees contained in the ensemble.
An integer for the minimum number of data points in a node that are required for the node to be split further.
A named list of arguments to be used by the
underlying models (e.g., ranger::ranger
,
randomForest::randomForest
, etc.). These are not evaluated
until the model is fit and will be substituted into the model
fit expression.
Used for method consistency. Any arguments passed to
the ellipses will result in an error. Use engine_args
instead.
The data given to the function are not saved and are only used
to determine the mode of the model. For rand_forest
, the
possible modes are "regression" and "classification".
The model can be created using the fit()
function using the
following engines:
R: "ranger"
or "randomForests"
Spark: "spark"
# NOT RUN {
rand_forest(mode = "classification", trees = 2000)
# Parameters can be represented by a placeholder:
rand_forest(mode = "regression", mtry = varying())
# }
Run the code above in your browser using DataLab