Learn R Programming

Out-of-sample time series forecasting

Out-of-Sample time series forecasting is a common, important, and subtle task. The OOS package introduces a comprehensive and cohesive API for the out-of-sample forecasting workflow: data preparation, forecasting - including both traditional econometric time series models and modern machine learning techniques - forecast combination, model and error analysis, and forecast visualization.

See the OOS package website for examples and documentation.


Workflow and available Tools

1. Prepare Data

Clean OutliersImpute Missing Observations (via imputeTS)Dimension Reduction
WinsorizeLinear InterpolationPrincipal Components
TrimKalman Filter
Fill-Forward
Average
Moving Average
Seasonal Decomposition

2. Forecast

Univariate Forecasts (via forecast)Multivariate Forecasts (via caret)Forecast Combinations
Random WalkVector AutoregressionMean
ARIMALinear RegressionMedian
ETSLASSO RegressionTrimmed (Winsorized) Mean
SplineRidge RegressionN-Best
Theta MethodElastic NetLinear Regression
TBATSPrincipal Component RegressionLASSO Regression
STLPartial Least Squares RegressionRidge Regression
AR PerceptronRandom ForestPartial Egalitarian LASSO
Tree-Based Gradient Boosting MachinePrincipal Component Regression
Single Layered Neural NetworkPartial Least Squares Regression
Random Forest
Tree-Based Gradient Boosting Machine
Single Layered Neural Network

3. Analyze

AccuracyCompareVisualize
Mean Square Error (MSE)Forecast Error RatiosForecasts
Root Mean Square Error (RMSE)Diebold-Mariano Test (for unnested models)Errors
Mean Absolute Error (MAE)Clark and West Test (for nested models)
Mean Absolute Percentage Error (MAPE)

Model estimation flexibility and accessibility

Users may edit any model training routine through accessing a list of function arguments. For machine learning techniques, this entails editing caret arguments including: tuning grid, control grid, method, and accuracy metric. For univariate time series forecasting, this entails passing arguments to forecast package model functions. For imputing missing variables, this entails passing arguments to imputeTS package functions.

A brief example using an Arima model to forecast univariate time series:

# 1. create the central list of univariate model training arguments, univariate.forecast.training  
forecast_univariate.control_panel = instantiate.forecast_univariate.control_panel()  

# 2. select an item to edit, for example the Arima order to create an ARMA(1,1)   
	# view default model arguments (there are none)  
	forecast_univariate.control_panel$arguments[['Arima']] 
	# add our own function arguments  
	forecast_univariate.control_panel$arguments[['Arima']]$order = c(1,0,1) 

A brief example using the Random Forest to combine forecasts:

# 1. create the central list of ML training arguments 
forecast_combinations.control_panel = instantiate.forecast_combinations.control_panel()  

# 2. select an item to edit, for example the random forest tuning grid   
	# view default tuning grid  
	forecast_combinations.control_panel$tuning.grids[['RF']]  
	# edit tuning grid   
	forecast_combinations.control_panel$tuning.grids[['RF']] = expand.grid(mtry = c(1:6))  

Basic workflow

#----------------------------------------
### Forecasting Example
#----------------------------------------
# pull and prepare data from FRED
quantmod::getSymbols.FRED(
	c('UNRATE','INDPRO','GS10'), 
	env = globalenv())
Data = cbind(UNRATE, INDPRO, GS10)
Data = data.frame(Data, date = zoo::index(Data)) %>%
	dplyr::filter(lubridate::year(date) >= 1990)

# run univariate forecasts 
forecast.uni = 
	forecast_univariate(
		Data = dplyr::select(Data, date, UNRATE),
		forecast.dates = tail(Data$date,15), 
		method = c('naive','auto.arima', 'ets'),      
		horizon = 1,                         
		recursive = FALSE,

		# information set       
		rolling.window = NA,    
		freq = 'month',                   
		
		# outlier cleaning
		outlier.clean = FALSE,
		outlier.variables = NULL,
		outlier.bounds = c(0.05, 0.95),
		outlier.trim = FALSE,
		outlier.cross_section = FALSE,
		
		# impute missing
		impute.missing = FALSE,
		impute.method = 'kalman',
		impute.variables = NULL,
		impute.verbose = FALSE) 

# create multivariate forecasts
forecast.multi = 
	forecast_multivariate(
		Data = Data,           
		forecast.date = tail(Data$date,15),
		target = 'UNRATE',
		horizon = 1,
		method = c('ols','lasso','ridge','elastic','GBM'),

		# information set       
		rolling.window = NA,    
		freq = 'month',                   
		
		# outlier cleaning
		outlier.clean = FALSE,
		outlier.variables = NULL,
		outlier.bounds = c(0.05, 0.95),
		outlier.trim = FALSE,
		outlier.cross_section = FALSE,
		
		# impute missing
		impute.missing = FALSE,
		impute.method = 'kalman',
		impute.variables = NULL,
		impute.verbose = FALSE,
		
		# dimension reduction
		reduce.data = FALSE,
		reduce.variables = NULL,
		reduce.ncomp = NULL,
		reduce.standardize = TRUE) 

# combine forecasts and add in observed values
forecasts = 
	dplyr::bind_rows(
		forecast.uni,
		forecast.multi) %>%
	dplyr::left_join( 
		dplyr::select(Data, date, observed = UNRATE))

# forecast combinations 
forecast.combo = 
	forecast_combine(
		forecasts, 
		method = c('uniform','median','trimmed.mean',
				   'n.best','lasso','peLasso','RF'), 
		burn.in = 5, 
		n.max = 2)

# merge forecast combinations back into forecasts
forecasts = 
	forecasts %>%
	dplyr::bind_rows(forecast.combo)

# calculate forecast errors
forecast.error = forecast_accuracy(forecasts)

# view forecast errors from least to greatest 
#   (best forecast to worst forecast method)
forecast.error %>% 
	dplyr::mutate_at(vars(-model), round, 3) %>%
	dplyr::arrange(MSE)

# compare forecasts to the baseline (a random walk)
forecast_comparison(
	forecasts,
	baseline.forecast = 'naive',  
	test = 'ER',
	loss = 'MSE') %>% 
	arrange(error.ratio)

# chart forecasts
chart = 
	chart_forecast(
		forecasts,              
		Title = 'US Unemployment Rate',
		Ylab = 'Index',
		Freq = 'Monthly')

chart

Contact

If you should have questions, concerns, or wish to collaborate, please contact Tyler J. Pike

Copy Link

Version

Install

install.packages('OOS')

Monthly Downloads

27

Version

1.0.0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Tyler J. Pike

Last Published

March 17th, 2021

Functions in OOS (1.0.0)

NBest

Select N-best forecasts
chart_forecast_error

Chart forecast errors
data_outliers

Clean outliers
data_reduction

Dimension reduction via principal components
data_subset

Create information set
data_impute

Impute missing values
chart_forecast

Chart forecasts
forecast_accuracy

Calculate forecast accuracy
forecast_combine

Forecast with forecast combinations
forecast_comparison

Compare forecast accuracy
winsorize

Winsorize or trim variables
instantiate.forecast_univariate.control_panel

Create interface to control forecast_univariate model estimation
instantiate.forecast_multivariate.var.control_panel

Create interface to control forecast_multivariate VAR estimation
instantiate.data_impute.control_panel

Create interface to control data_impute model estimation
forecast_univariate

Forecast with univariate models
instantiate.forecast_multivariate.ml.control_panel

Create interface to control forecast_multivariate ML estimation
instantiate.forecast_combinations.control_panel

Create interface to control forecast_combine model estimation
forecast_date

Set forecasted date
forecast_multivariate

Forecast with multivariate models
%>%

Pipe operator
standardize

Standardize variables (mean 0, variance 1)
loss_function

Calculate error via loss functions
n.lag

Create n lags