parsnip (version 0.0.0.9001)

rand_forest: General Interface for Random Forest Models

Description

rand_forest is a way to generate a specification of a model before fitting and allows the model to be created using different packages in R or via Spark. The main arguments for the model are:

  • mtry: The number of predictors that will be randomly sampled at each split when creating the tree models.

  • trees: The number of trees contained in the ensemble.

  • min_n: The minimum number of data points in a node that are required for the node to be split further.

These arguments are converted to their specific names at the time that the model is fit. Other options and argument can be set using the others argument. If left to their defaults here (NULL), the values are taken from the underlying model functions.

If parameters need to be modified, this function can be used in lieu of recreating the object from scratch.

Usage

rand_forest(mode = "unknown", mtry = NULL, trees = NULL, min_n = NULL,
  others = list(), ...)

# S3 method for rand_forest update(object, mtry = NULL, trees = NULL, min_n = NULL, others = list(), fresh = FALSE, ...)

Arguments

mode

A single character string for the type of model. Possible values for this model are "unknown", "regression", or "classification".

mtry

An integer for the number of predictors that will be randomly sampled at each split when creating the tree models.

trees

An integer for the number of trees contained in the ensemble.

min_n

An integer for the minimum number of data points in a node that are required for the node to be split further.

others

A named list of arguments to be used by the underlying models (e.g., ranger::ranger, randomForest::randomForest, etc.). .

...

Used for method consistency. Any arguments passed to the ellipses will result in an error. Use others instead.

object

A random forest model specification.

fresh

A logical for whether the arguments should be modified in-place of or replaced wholesale.

Value

An updated model specification.

Details

The data given to the function are not saved and are only used to determine the mode of the model. For rand_forest, the possible modes are "regression" and "classification".

The model can be created using the fit() function using the following engines:

  • R: "ranger" or "randomForests"

  • Spark: "spark"

Main parameter arguments (and those in others) can avoid evaluation until the underlying function is executed by wrapping the argument in rlang::expr() (e.g. mtry = expr(floor(sqrt(p)))).

See Also

varying(), fit()

Examples

Run this code
# NOT RUN {
rand_forest(mode = "classification", trees = 2000)
# Parameters can be represented by a placeholder:
rand_forest(mode = "regression", mtry = varying())
model <- rand_forest(mtry = 10, min_n = 3)
model
update(model, mtry = 1)
update(model, mtry = 1, fresh = TRUE)
# }

Run the code above in your browser using DataLab