ensemble_model_spec: Creates a Stacked Ensemble Model from a Model Spec

Description

A 3-stage stacking regressor that follows:

Stage 1: Sub-Model's are Trained & Predicted using resamples
Stage 2: A Meta-learner (model_spec) is trained on Out-of-Sample Sub-Model Predictions
Stage 3: The Best Meta-Learner Model is Selected (if tuning is used)

Usage

ensemble_model_spec(
  object,
  resamples,
  model_spec,
  kfolds = 5,
  param_info = NULL,
  grid = 6,
  control = control_grid()
)

Arguments

object

A Modeltime Table. Used for ensemble sub-models.

resamples

An rset resample object. Used to generate sub-model predictions for the meta-learner. See timetk::time_series_cv() or rsample::vfold_cv() for making resamples.

model_spec

A model_spec object defining the meta-learner stacking model specification to be used.

Can be either:

A non-tunable model_spec: Parameters are specified and are not optimized via tuning.
A tunable model_spec: Contains parameters identified for tuning with tune::tune()

kfolds

K-Fold Cross Validation for tuning the Meta-Learner. Controls the number of folds used in the meta-learner's cross-validation. Gets passed to rsample::vfold_cv().

param_info

A dials::parameters() object or NULL. If none is given, a parameters set is derived from other arguments. Passing this argument can be useful when parameter ranges need to be customized.

grid

Grid specification or grid size for tuning the Meta Learner. Gets passed to tune::tune_grid().

control

An object used to modify the tuning process. Uses tune::control_grid() by default. Use control_grid(verbose = TRUE) to follow the training process.

Details

Important Details:

Results will vary considerably if poor sub-model candidates are used, a poor sub-model resampling strategy is selected, a poor meta-learner is selected, if the metalearner is not tuned.

Use object (a Modeltime Table) to define your sub-models
Use resamples to define the submodel resampling procedure. Results will vary considerably if a poor resampling strategy is selected.
Use model_spec to define the meta-learner. Use tune::tune() to define meta-learner parameters for tuning.

Ensemble Process

The Meta-Learner Ensembling Process uses the following basic steps:

Make cross-validation predictions for each sub-model. The user provides the sub-models as a Modeltime Table (object) and the cross validation set as resamples (using a function like timetk::time_series_cv() or rsample::vfold_cv(). Each model in the Modeltime Table is trained & predicted on the resamples. The out-of-sample sub-model predictions are used as the input to the meta-learner.
Train a Stacked Regressor (Meta-Learner). The sub-model out-of-sample cross validation predictions are then modeled using a model_spec with options:
- Tuning: If the model_spec does include tuning parameters via tune::tune() then the meta-learner will be hypeparameter tuned using K-Fold Cross Validation. The parameters and grid can adjusted using kfolds, grid, and param_info.
- No-Tuning: If the model_spec does not include tuning parameters via tune::tune() then the meta-learner will not be hypeparameter tuned and will have the model fitted to the sub-model predictions.
Final Model Selection
- If tuned, the final model is selected based on RMSE, then retrained on the full set of out of sample predictions.
- If not-tuned, the fitted model from Stage 2 is used.

Progress

The best way to follow the training process and watch progress is to use control = control_grid(verbose = TRUE) to see progress.

Parallelize

Portions of the process can be parallelized. To parallelize, set up parallelization using tune via one of the backends such as doFuture. Then set control = control_grid(allow_par = TRUE)

Examples

Run this code

# NOT RUN {
library(tidymodels)
library(modeltime)
library(modeltime.ensemble)
library(tidyverse)
library(timetk)

# }
# NOT RUN {
resamples_tscv <- training(m750_splits) %>%
    time_series_cv(
        assess  = "2 years",
        initial = "5 years",
        skip    = "2 years",
        slice_limit = 1
    )

# No Metalearner Tuning ----
ensemble_fit_lm <- m750_models %>%
    ensemble_model_spec(
        resamples  = resamples_tscv,
        model_spec = linear_reg() %>% set_engine("lm"),
        control    = control_grid(verbose = TRUE)
    )

ensemble_fit_lm

# With Metalearner Tuning ----
ensemble_fit_glmnet <- m750_models %>%
    ensemble_model_spec(
        resamples  = resamples_tscv,
        model_spec = linear_reg(
                penalty = tune(),
                mixture = tune()
            ) %>%
            set_engine("glmnet"),
        grid       = 2,
        control    = control_grid(verbose = TRUE)
    )

ensemble_fit_glmnet

# }
# NOT RUN {
# }

Run the code above in your browser using DataLab