example_NN: Visual examples of training a Neural Network

Description

Some examples of training a neural network using simple randomly generated data. The training process is visualized through plots. Most parameters can be adjusted so that the effect of changes can be assessed by inspecting the plots.

Usage

example_NN(example_type = "nested", example_n = 500, example_sdnoise = 1,
  example_nframes = 30, hiddenLayers = c(5, 5), lossFunction = "log",
  dHuber = 1, rectifierLayers = NA, sigmoidLayers = NA,
  regression = FALSE, standardize = TRUE, learnRate = 0.001,
  maxEpochs = 2000, batchSize = 10, momentum = 0.3, L1 = 1e-07,
  L2 = 1e-04)

Arguments

example_type

which example to use. Possible values are surface, polynomial, nested, linear, disjoint and multiclass

example_n

number of observations to generate

example_sdnoise

standard deviation of random normal noise to be added to data

example_nframes

number of frames to be plotted

hiddenLayers

vector specifying the number of nodes in each layer. Set to NA for a Network without any hidden layers

lossFunction

which loss function should be used. Options are "log", "quadratic", "absolute", "huber" and "pseudo-huber"

dHuber

used only in case of loss functions "huber" and "pseudo-huber". This parameter controls the cut-off point between quadratic and absolute loss.

rectifierLayers

vector or integer specifying which layers should have rectifier activation in its nodes

sigmoidLayers

vector or integer specifying which layers should have sigmoid activation in its nodes

regression

logical indicating regression or classification

standardize

logical indicating if X and y should be standardized before training the network. Recommended to leave at TRUE for faster convergence.

learnRate

the size of the steps made in gradient descent. If set too large, optimization can become unstable. Is set too small, convergence will be slow.

maxEpochs

the maximum number of epochs (one iteration through training data).

batchSize

the number of observations to use in each batch. Batch learning is computationally faster than stochastic gradient descent. However, large batches might not result in optimal learning, see Le Cun for details.

momentum

numeric value specifying how much momentum should be used. Set to zero for no momentum, otherwise a value between zero and one.

L1 regularization. Non-negative number. Set to zero for no regularization.

L2 regularization. Non-negative number. Set to zero for no regularization.

Details

One regression example and three classification examples are included. More examples will be added in future versions of ANN.