snap: snap

Description

A simple wrapper to easily design vanilla deep neural networks using 'Tensorflow'/'Keras' backend for regression, classification and multi-label tasks, with some tweaks and tricks (skip shortcuts, embedding, feature selection and anomaly detection).

Usage

snap(
  data,
  target,
  task = NULL,
  positive = NULL,
  skip_shortcut = FALSE,
  embedding = "none",
  embedding_size = 10,
  folds = 3,
  reps = 1,
  holdout = 0.3,
  layers = 1,
  activations = "relu",
  regularization_L1 = 0,
  regularization_L2 = 0,
  nodes = 32,
  dropout = 0,
  span = 0.2,
  min_delta = 0,
  batch_size = 32,
  epochs = 50,
  imp_thresh = 0,
  anom_thresh = 1,
  output_activation = NULL,
  optimizer = "Adam",
  loss = NULL,
  metrics = NULL,
  winsor = FALSE,
  q_min = 0.01,
  q_max = 0.99,
  normalization = TRUE,
  seed = 42,
  verbose = 0
)

Arguments

data

A data frame including all the features and targets.

target

String. Single label for target feature when task is "regr" or "classif". String vector with multiple labels for target features when task is "multilabel".

task

String. Inferred by data type of target feature(s). Available options are: "regr", "classif", "multilabel". Default: NULL.

positive

String. Positive class label (only for classification task). Default: NULL.

skip_shortcut

Logical. Option to add a skip shortcut to improve network performance in case of many layers. Default: FALSE.

embedding

String. Available options are: "none", "global" (when identical values for different features hold different meanings), "sequence" (when identical values for different features hold the same meaning). Default: NULL.

embedding_size

Integer. Output dimension for the embedding layer. Default: 10.

folds

Positive integer. Number of folds for repeated cross-validation. Default: 3.

reps

Positive integer. Number of repetitions for repeated cross-validation. Default: 1.

holdout

Positive numeric. Percentage of cases for holdout validation. Default: 0.3.

layers

Positive integer. Number of layers for the neural net. Default: 1.

activations

String. String vector with the activation functions for each layer (for example, a neural net with 3 layers may have activations = c("relu", "gelu", "tanh")). Besides standard Tensorflow/Keras activations, you can also choose: "swish", "mish", "gelu", "bent". Default: "relu".

regularization_L1

Positive numeric. Value for L1 regularization of the loss function. Default: 0.

regularization_L2

Positive numeric. Value for L2 regularization of the loss function. Default: 0.

nodes

Positive integer. Integer vector with the nodes for each layer (for example, a neural net with 3 layers may have nodes = c(32, 64, 16)). Default: 32.

dropout

Positive numeric. Value for the dropout parameter for each layer (for example, a neural net with 3 layers may have dropout = c(0, 0.5, 0.3)). Default: 0.

span

Positive numeric. Percentage of epoch for the patience parameter. Default: 0.2.

min_delta

Positive numeric. Minimum improvement on metric to trigger the early stop. Default: 0.

batch_size

Positive integer. Maximum batch size for training. Default: 32.

epochs

Positive integer. Maximum number of forward and backward propagations. Default: 50.

imp_thresh

Positive numeric. Importance threshold (in percentiles) above which the features are included in the model (using ReliefFbestK metric by CORElearn). Default: 0 (all features included).

anom_thresh

Positive numeric. Anomaly threshold (in percentiles) above which the instances are excluded by the model (using lof by dbscan). Default: 1 (all instances included).

output_activation

String. Default: NULL. If not specified otherwise, it will be "Linear" for regression task, "Softmax" for classification task, "Sigmoid" for multilabel task.

optimizer

String. Standard Tensorflow/Keras Optimization methods are available. Default: "Adam".

loss

Default: NULL. If not specified otherwise, it will be "mean_absolute_error" for regression task, "categorical_crossentropy" for classification task, "binary_crossentropy" for multilabel task.

metrics

Default: NULL. If not specified otherwise, it will be "mean_absolute_error" for regression task, "categorical_crossentropy" for classification task, "binary_crossentropy" for multilabel task.

winsor

Logical. Set to TRUE in case you want to perform Winsorization on regression tasks. Default: FALSE.

q_min

Positive numeric. Minimum quantile threshold for Winsorization. Default: 0.01.

q_max

Positive numeric. Maximum quantile threshold for Winsorization. Default: 0.99.

normalization

Logical. After each layer it performs a batch normalization. Default: TRUE.

seed

Positive integer. Seed value to control random processes. Default: 42.

verbose

Positive integer. Set the level of information from Keras. Default: 0.

Value

This function returns a list including:

task: kind of task solved
configuration: main hyper-parameters describing the neural net (layers, activations, regularization_L1, regularization_L2, nodes, dropout)
model: Keras standard model description
pred_fun: function to use on the same data scheme to predict new values
plot: Keras standard history plot
testing_frame: testing set with the related predictions, including
trials: statistics for each trial during the repeated cross-validation (train set and validation set):
- task "classif": balanced accuracy (bac), precision (prc), sensitivity (sen), critical success index (csi), FALSE-score (fsc), Kappa (kpp), Kendall (kdl)
- task "regr": root mean square error(rmse), mean absolute error (mae), median absolute error (mdae), relative root square error (rrse), relative absolute error (rae), Pearson (prsn)
- task "multilabel": macro bac, macro prc, macro sensitivity, macro sen, macro csi, macro fsc, micro kpp, micro kdl
metrics: summary statistics as above for training, validation (both averaged over trials) and testing
selected_feat: labels of features included within the model
selected_inst: index of instances included within the model
time_log

Examples

Run this code

# NOT RUN {
snap(friedman3, target="y")

snap(threenorm, target="classes", imp_thresh = 0.3, anom_thresh = 0.95)

snap(threenorm, "classes", layers = 2, activations = c("gelu", "swish"), nodes = c(32, 64))
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples