rf_fit: Tune hyperparameters used in a random forest model

Description

`rf_fit()` conducts an out-of-bag evaluation for hyperparameters used in constructing the random forest model using a grid search approach.

Usage

rf_fit(
  df,
  colname_label,
  vctr_colname_feature = NULL,
  vctr_min_nodesize = c(5),
  vctr_m_try = NULL,
  vctr_subsample = c(0.1),
  frac_train = 0.75,
  n_tree = 500,
  ran_seed = 12345,
  label_err = -9999
)

Value

A data frame with columns below:

* The first column, `min_nodesize`, gives one of the `vctr_min_nodesize` hyperparameter values tested in each model construction during out-of-bag evaluation.

* The second column, `m_try`, gives one of the `vctr_m_try` hyperparameter values tested in each model construction during out-of-bag evaluation.

* The third column, `subsample`, gives one of the `vctr_subsample` hyperparameter values tested in each model construction during out-of-bag evaluation.

* The fourth column, `MSE_OOB`, provides the mean squared error between the predicted and original values in out-of-bag data in each model construction during the evaluation.

Arguments

df: A data frame including label (explained variable) and feature (explanatory variables) time series for model input. It is acceptable to include missing values in each column.
colname_label: A character representing the name of the column for the label time series.
vctr_colname_feature: A vector of characters indicating the name of the feature time series columns used in constructing a random forest model. If `NULL` (default), all columns excluding the label column specified as `colname_label` in the input data frame are used as feature columns.
vctr_min_nodesize: A vector of positive integers indicating candidates of a hyperparameter for the random forest model, defining the minimal node size (the minimum number of data points included in each leaf node). Default is `c(5)`.
vctr_m_try: A vector of positive integers indicating candidates of a hyperparameter for the random forest model, defining the number of features to be used in splitting each node. If `NULL` (default), integers between two and the number of all feature variables are tested.
vctr_subsample: A vector of numerical values between 0 and 1, indicating candidates of a hyperparameter for the random forest model, defining the fraction of input training data points to be sampled in constructing the random forest. Default is `c(0.1)`.
frac_train: A numerical value between 0 and 1, defining the fraction of data points to be categorized as training data for the random forest model construction. The other data points are classified as test data. Default is 0.75.
n_tree: An integer representing the number of trees in the random forest. Default is 500.
ran_seed: An integer representing the random seed for the random forest model construction. Default is 12345.
label_err: A numeric value representing a missing value in the input vector(s). Default is -9999.

Author

Yoshiaki Hata