proteus_random_search: proteus_random_search

Description

proteus_random_search is a function for fine-tuning using random search on the hyper-parameter space of proteus (predefined or custom).

Usage

proteus_random_search(
  n_samp,
  data,
  target,
  future,
  past = NULL,
  ci = 0.8,
  smoother = FALSE,
  t_embed = NULL,
  activ = NULL,
  nodes = NULL,
  distr = NULL,
  optim = NULL,
  loss_metric = "crps",
  epochs = 30,
  lr = NULL,
  patience = 10,
  latent_sample = 100,
  verbose = TRUE,
  stride = NULL,
  dates = NULL,
  rolling_blocks = FALSE,
  n_blocks = 4,
  block_minset = 10,
  error_scale = "naive",
  error_benchmark = "naive",
  batch_size = 30,
  min_default = 1,
  seed = 42,
  future_plan = "future::multisession",
  omit = FALSE,
  keep = FALSE
)

Value

This function returns a list including:

random_search: summary of the sampled hyper-parameters and average error metrics.
best: best model according to overall ranking on all average error metrics (for negative metrics, absolute value is considered).
all_models: list with all generated models (if keep flagged to TRUE).
time_log: computation time.

Arguments

n_samp: Positive integer. Number of models to be randomly generated sampling the hyper-parameter space.
data: A data frame with time features on columns and possibly a date column (not mandatory).
target: Vector of strings. Names of the time features to be jointly analyzed.
future: Positive integer. The future dimension with number of time-steps to be predicted.
past: Positive integer. Length of past sequences. Default: NULL (search range future:2*future).
ci: Positive numeric. Confidence interval. Default: 0.8.
smoother: Logical. Perform optimal smoothing using standard loess for each time feature. Default: FALSE.
t_embed: Positive integer. Number of embedding for the temporal dimension. Minimum value is equal to 2. Default: NULL (search range 2:30).
activ: String. Activation function to be used by the forward network. Implemented functions are: "linear", "mish", "swish", "leaky_relu", "celu", "elu", "gelu", "selu", "bent", "softmax", "softmin", "softsign", "softplus", "sigmoid", "tanh". Default: NULL (full-option search).
nodes: Positive integer. Nodes for the forward neural net. Default: NULL (search range 2:1024).
distr: String. Distribution to be used by variational model. Implemented distributions are: "normal", "genbeta", "gev", "gpd", "genray", "cauchy", "exp", "logis", "chisq", "gumbel", "laplace", "lognorm", "skewed". Default: NULL (full-option search).
optim: String. Optimization method. Implemented methods are: "adadelta", "adagrad", "rmsprop", "rprop", "sgd", "asgd", "adam". Default: NULL (full-option search).
loss_metric: String. Loss function for the variational model. Three options: "elbo", "crps", "score". Default: "crps".
epochs: Positive integer. Default: 30.
lr: Positive numeric. Learning rate. Default: NULL (search range 0.001:0.1).
patience: Positive integer. Waiting time (in epochs) before evaluating the overfit performance. Default: epochs.
latent_sample: Positive integer. Number of samples to draw from the latent variables. Default: 100.
verbose: Logical. Default: TRUE
stride: Positive integer. Number of shifting positions for sequence generation. Default: NULL (search range 1:3).
dates: String. Label of feature where dates are located. Default: NULL (progressive numbering).
rolling_blocks: Logical. Option for incremental or rolling window. Default: FALSE.
n_blocks: Positive integer. Number of distinct blocks for back-testing. Default: 4.
block_minset: Positive integer. Minimum number of sequence to create a block. Default: 3.
error_scale: String. Scale for the scaled error metrics (for continuous variables). Two options: "naive" (average of naive one-step absolute error for the historical series) or "deviation" (standard error of the historical series). Default: "naive".
error_benchmark: String. Benchmark for the relative error metrics (for continuous variables). Two options: "naive" (sequential extension of last value) or "average" (mean value of true sequence). Default: "naive".
batch_size: Positive integer. Default: 30.
min_default: Positive numeric. Minimum differentiation iteration. Default: 1.
seed: Random seed. Default: 42.
future_plan: how to resolve the future parallelization. Options are: "future::sequential", "future::multisession", "future::multicore". For more information, take a look at future specific documentation. Default: "future::multisession".
omit: Logical. Flag to TRUE to remove missing values, otherwise all gaps, both in dates and values, will be filled with kalman filter. Default: FALSE.
keep: Logical. Flag to TRUE to keep all the explored models. Default: FALSE.

Author

Giancarlo Vercellino giancarlo.vercellino@gmail.com

References

https://rpubs.com/giancarlo_vercellino/proteus