The modeltime_forecast()
function prepares a forecast for visualization with
with plot_modeltime_forecast()
. The forecast is controlled by new_data
or h
,
which can be combined with existing data (controlled by actual_data
).
Confidence intervals are included if the incoming Modeltime Table has been
calibrated using modeltime_calibrate()
.
Otherwise confidence intervals are not estimated.
New Data
When forecasting you can specify future data using new_data
.
This is a future tibble with date column and columns for xregs
extending the trained dates and exogonous regressors (xregs) if used.
Forecasting Evaluation Data: By default, the new_data
will use the .calibration_data
if new_data
is not provided.
This is the equivalent of using rsample::testing()
for getting test data sets.
Forecasting Future Data: See timetk::future_frame()
for creating future tibbles.
Xregs: Can be used with this method
H (Horizon)
When forecasting, you can specify h
. This is a phrase like "1 year",
which extends the .calibration_data
(1st priority) or the actual_data
(2nd priority)
into the future.
Forecasting Future Data: All forecasts using h
are
extended after the calibration data or actual_data.
Extending .calibration_data
- Calibration data is given 1st priority, which is
desirable after refitting with modeltime_refit()
.
Internally, a call is made to timetk::future_frame()
to
expedite creating new data using the date feature.
Extending actual_data
- If h
is provided, and the modeltime table has not been
calibrated, the "actual_data" will be extended into the future. This is useful
in situations where you want to go directly from modeltime_table()
to modeltime_forecast()
without calibrating or refitting.
Xregs: Cannot be used because future data must include new xregs.
If xregs are desired, build a future data frame and use new_data
.
Actual Data
This is reference data that contains the true values of the time-stamp data.
It helps in visualizing the performance of the forecast vs the actual data.
When h
is used and the Modeltime Table has not been calibrated, then the
actual data is extended into the future periods that are defined by h
.
Confidence Interval Estimation
Confidence intervals (.conf_lo
, .conf_hi
) are estimated based on the normal estimation of
the testing errors (out of sample) from modeltime_calibrate()
.
The out-of-sample error estimates are then carried through and
applied to applied to any future forecasts.
The confidence interval can be adjusted with the conf_interval
parameter. The algorithm used
to produce confidence intervals can be changed with the conf_method
parameter.
Conformal Default Method:
When conf_method = "conformal_default"
(default), this method uses qnorm()
to produce a 95% confidence interval by default. It estimates a normal (Gaussian distribution)
based on the out-of-sample errors (residuals).
The confidence interval is mean-adjusted, meaning that if the mean of the residuals
is non-zero, the confidence interval is adjusted to widen the interval to capture
the difference in means.
Conformal Split Method:
When conf_method = "conformal_split
, this method uses the split conformal inference method
described by Lei et al (2018). This is also implemented in the probably
R package's
int_conformal_split()
function.
What happens to the confidence interval after refitting models?
Refitting has no affect on the confidence interval since this is calculated independently of
the refitted model. New observations typically improve
future accuracy, which in most cases makes the out-of-sample confidence intervals conservative.
Keep Data
Include the new data (and actual data) as extra columns with the results of the model forecasts.
This can be helpful when the new data includes information useful to the forecasts.
An example is when forecasting Panel Data and the new data contains
ID features related to the time series group that the forecast belongs to.
Arrange Index
By default, modeltime_forecast()
keeps the original order of the data.
If desired, the user can sort the output by .key
, .model_id
and .index
.