setarforest: Fitting SETAR-Forest models

Description

Fits a SETAR-Forest model either using a list of time series or an embedded input matrix and labels.

Usage

setarforest(
  data,
  label = NULL,
  lag = 10,
  bagging_fraction = 0.8,
  bagging_freq = 10,
  random_tree_significance = TRUE,
  random_tree_significance_divider = TRUE,
  random_tree_error_threshold = TRUE,
  depth = 1000,
  significance = 0.05,
  significance_divider = 2,
  error_threshold = 0.03,
  stopping_criteria = "both",
  mean_normalisation = FALSE,
  window_normalisation = FALSE,
  verbose = 2,
  num_cores = NULL,
  categorical_covariates = NULL
)

Value

An object of class setarforest which contains the following properties.

trees: A list of objects of class setartree which represents the trained SETAR-Tree models in the forest.
lag: The number of features used to train each SEATR-Tree in the forest.
feature_names: Names of the input features.
coefficients: Names of the coefficients of leaf node regresion models in each SETAR-Tree in the forest.
categorical_covariate_values: Information about the categorical covarites used during training (only if applicable).
mean_normalisation: Whether mean normalisation was applied for the training data.
window_normalisation: Whether window normalisation was applied for the training data.
input_type: Type of input data used to train the SETAR-Forest. This is list if data is a list of time series, and df if data is a dataframe/matrix containing model inputs.
execution_time: Execution time of SETAR-Forest.

Arguments

data: A list of time series (each list element is a separate time series) or a dataframe/matrix containing model inputs (the columns can contain past time series lags and/or external numerical/categorical covariates).
label: A vector of true outputs. This parameter is only required when data is a dataframe/matrix containing the model inputs.
lag: The number of past time series lags that should be used when fitting each SETAR-Tree in the forest. This parameter is only required when data is a list of time series. Default value is 10.
bagging_fraction: The percentage of instances that should be used to train each SETAR-Tree in the forest. Default value is 0.8.
bagging_freq: The number of SETAR-Trees in the forest. Default value is 10.
random_tree_significance: Whether a random significance should be considered for splitting per each tree. Each node split within the tree considers the same significance level. When this parameter is set to TRUE, the "significance" parameter will be ignored. Default value is TRUE.
random_tree_significance_divider: Whether a random significance divider should be considered for splitting per each tree. When this parameter is set to TRUE, the "significance_divider" parameter will be ignored. Default value is TRUE.
random_tree_error_threshold: Whether a random error threshold should be considered for splitting per each tree. Each node split within the tree considers the same error threshold. When this parameter is set to TRUE, the "error_threshold" parameter will be ignored. Default value is TRUE.
depth: Maximum depth of each SETAR-Tree in the forest. Default value is 1000. Thus, unless specify a lower value, the depth of a SETAR-Tree is actually controlled by the stopping criterion.
significance: In each SETAR-Tree in the forest, the initial significance used by the linearity test (alpha_0). Default value is 0.05.
significance_divider: In each SETAR-Tree in the forest, the corresponding significance in a tree level is divided by this value. Default value is 2.
error_threshold: In each SETAR-Tree in the forest, the minimum error reduction percentage between parent and child nodes to make a split. Default value is 0.03.
stopping_criteria: The required stopping criteria for each SETAR-Tree in the forest: linearity test (lin_test), error reduction percentage (error_imp) or linearity test and error reduction percentage (both). Default value is "both".
mean_normalisation: Whether each series should be normalised by deducting its mean value before building the forest. This parameter is only required when data is a list of time series. Default value is FALSE.
window_normalisation: Whether the window-wise normalisation should be applied before building the forest. This parameter is only required when data is a list of time series. When this is TRUE, each row of the training embedded matrix is normalised by deducting its mean value before building the forest. Default value is FALSE.
verbose: Controls the level of the verbosity of SETAR-Forest: 0 (errors/warnings), 1 (limited amount of information including the depth of the currently processing tree), 2 (full training information including the depth of the currently processing tree and stopping criterion related details in each tree). Default value is 2.
num_cores: The number of cores to be used. num_cores > 1 means parallel processing. When not provided, it will find the available number of cores and use those to run the SETAR-Trees in the forest in parallel.
categorical_covariates: Names of the categorical covariates in the input data. This parameter is only required when data is a dataframe/matrix and it contains categorical variables.

Examples

Run this code

# \donttest{
# Training SETAR-Forest with a list of time series
setarforest(chaotic_logistic_series, bagging_freq = 2, num_cores = 1)

# Training SETAR-Forest with a dataframe containing model inputs where the model inputs may contain
# past time series lags and numerical/categorical covariates
setarforest(data = web_traffic_train[,-1],
            label = web_traffic_train[,1],
            bagging_freq = 2,
            num_cores = 1,
            categorical_covariates = "Project")
# }

Run the code above in your browser using DataLab