replicator: Train a Replicator Neural Network

Description

Train a Replicator Neural Network using Stohastic Gradient descent with optional batch learning. See Hawkins et al. (2002) for details on Replicator Neural Networks.

Usage

replicator(X, hiddenLayers = c(10, 5, 10), lossFunction = "pseudo-huber",
  dHuber = 1, stepLayers = 2, nSteps = 5, smoothSteps = 25,
  rampLayers = NA, rectifierLayers = NA, sigmoidLayers = NA,
  standardize = TRUE, learnRate = 1e-06, maxEpochs = 1000,
  batchSize = 32, momentum = 0.2, L1 = 0, L2 = 0, validLoss = TRUE,
  validProp = 0.1, verbose = TRUE, earlyStop = FALSE,
  earlyStopEpochs = 50, earlyStopTol = -1e-07, lrSched = FALSE,
  lrSchedEpochs = 800, lrSchedLearnRates = 1e-07, robErrorCov = FALSE)

Arguments

matrix with explanatory variables

hiddenLayers

vector specifying the number of nodes in each layer. Set to NA for a Network without any hidden layers

lossFunction

which loss function should be used. Options are "log", "quadratic", "absolute", "huber" and "pseudo-huber"

dHuber

used only in case of loss functions "huber" and "pseudo-huber". This parameter controls the cut-off point between quadratic and absolute loss.

stepLayers

vector or integer specifying which layers should have stepwise activation in its nodes

nSteps

numeric integer specifying how many steps the step function should have on the interval [0, 1]

smoothSteps

numeric indicating the smoothness of the step function. Smaller values result in smoother steps. Recommended to keep below 50 for stability. If set to high, the derivative of the stepfunction will also be large

rampLayers

vector or integer specifying which layers should have ramplike activation in its nodes. This is equivalent to a stepfunction with an infinite number of steps (limit of step function when nSteps and smoothSteps go to infinity) but more efficient than using step function layer with a large number for nSteps.

rectifierLayers

vector or integer specifying which layers should have rectifier activation in its nodes

sigmoidLayers

vector or integer specifying which layers should have sigmoid activation in its nodes

standardize

logical indicating if X and y should be standardized before training the network. Recommended to leave at TRUE for faster convergence.

learnRate

the size of the steps made in gradient descent. If set too large, optimization can become unstable. Is set too small, convergence will be slow.

maxEpochs

the maximum number of epochs (one iteration through training data).

batchSize

the number of observations to use in each batch. Batch learning is computationally faster than stochastic gradient descent. However, large batches might not result in optimal learning, see Le Cun for details.

momentum

numeric value specifying how much momentum should be used. Set to zero for no momentum, otherwise a value between zero and one.

L1 regularization. Non-negative number. Set to zero for no regularization.

L2 regularization. Non-negative number. Set to zero for no regularization.

validLoss

logical indicating if loss should be monitored during training. If TRUE, a validation set of proportion validProp is randomly drawn from full training set. Use function plot to assess convergence.

validProp

proportion of training data to use for validation

verbose

logical indicating if additional information (such as lifesign) should be printed to console during training.

earlyStop

logical indicating if early stopping should be used based on the loss on a validation set. Only possible with validLoss set to TRUE

earlyStopEpochs

after how many epochs without sufficient improvement (as specified by earlyStopTol) should training be stopped.

earlyStopTol

numerical value specifying tolerance for early stopping. Can be either positive or negative. When set negative, training will be stopped if improvements are made but improvements are smaller than tolerance.

lrSched

logical indicating if a schedule for the learning rate should be used. If TRUE, schedule as specified by lrSchedEpochs and

lrSchedEpochs

vector with elements specifying the epoch after which the corresponding learn rate from vector lrSchedLearnRates. Length of vector shoud be the same as length of learnSchedLearnRates.

lrSchedLearnRates

vector with elements specifying the learn rate to be used after epochs determined by lrSchedEpochs.

robErrorCov

logical indicating if robust covariance should be estimated in order to assess Mahalanobis distances of reconstruction errors lrSchedLearnRates .

Value

An ANN object. Use function plot(<object>) to assess loss on training and optionally validation data during training process. Use function predict(<object>, <newdata>) for prediction.

Details

A function for training an Replicator Neural Network.

References

Hawkins, Simon, et al. "Outlier detection using replicator neural networks." DaWaK. Vol. 2454. 2002.

Examples

Run this code

# NOT RUN {
# Replicator
repNN <- replicator(faithful, hiddenLayers = c(4,1,4), batchSize = 5,
                    learnRate = 1e-5, momentum = 0.5, L1 = 1e-3, L2 = 1e-3,
                    robErrorCov = TRUE)
plot(repNN)

rX <- reconstruct(repNN, faithful)
plot(rX, alpha = 0.05)
plot(faithful, col = (rX$mah_p < 0.05)+1, pch = 16)
# }

Run the code above in your browser using DataLab