Train a Multilayer Neural Network using Stohastic Gradient
Descent with optional batch learning. Functions autoencoder
and replicator
are special cases of this general function.
neuralnetwork(X, y, hiddenLayers, lossFunction = "log", dHuber = 1,
rectifierLayers = NA, sigmoidLayers = NA, regression = FALSE,
standardize = TRUE, learnRate = 1e-04, maxEpochs = 1000,
batchSize = 32, momentum = 0.2, L1 = 1e-07, L2 = 1e-04,
validLoss = TRUE, validProp = 0.2, verbose = TRUE, earlyStop = TRUE,
earlyStopEpochs = 50, earlyStopTol = -1e-07, lrSched = FALSE,
lrSchedLearnRates = 1e-05, lrSchedEpochs = 400)
matrix with explanatory variables
matrix with dependent variables
vector specifying the number of nodes in each layer. Set
to NA
for a Network without any hidden layers
which loss function should be used. Options are "log", "quadratic", "absolute", "huber" and "pseudo-huber"
used only in case of loss functions "huber" and "pseudo-huber". This parameter controls the cut-off point between quadratic and absolute loss.
vector or integer specifying which layers should have rectifier activation in its nodes
vector or integer specifying which layers should have sigmoid activation in its nodes
logical indicating regression or classification
logical indicating if X and y should be standardized before
training the network. Recommended to leave at TRUE
for faster
convergence.
the size of the steps made in gradient descent. If set too large, optimization can become unstable. Is set too small, convergence will be slow.
the maximum number of epochs (one iteration through training data).
the number of observations to use in each batch. Batch learning is computationally faster than stochastic gradient descent. However, large batches might not result in optimal learning, see Le Cun for details.
numeric value specifying how much momentum should be used. Set to zero for no momentum, otherwise a value between zero and one.
L1 regularization. Non-negative number. Set to zero for no regularization.
L2 regularization. Non-negative number. Set to zero for no regularization.
logical indicating if loss should be monitored during training.
If TRUE
, a validation set of proportion validProp
is randomly
drawn from full training set. Use function plot
to assess convergence.
proportion of training data to use for validation
logical indicating if additional information (such as lifesign) should be printed to console during training.
logical indicating if early stopping should be used based on
the loss on a validation set. Only possible with validLoss
set to TRUE
after how many epochs without sufficient improvement
(as specified by earlyStopTol
) should training be stopped.
numerical value specifying tolerance for early stopping. Can be either positive or negative. When set negative, training will be stopped if improvements are made but improvements are smaller than tolerance.
logical indicating if a schedule for the learning rate should
be used. If TRUE
, schedule as specified by lrSchedEpochs
and
lrSchedLearnRates
.
vector with elements specifying the learn rate to be used after epochs determined by lrSchedEpochs.
vector with elements specifying the epoch after which the
corresponding learn rate from vector lrSchedLearnRates
. Length of vector
shoud be the same as length of learnSchedLearnRates
.
An NN
object. Use function plot(<object>)
to assess
loss on training and optionally validation data during training process. Use
function predict(<object>, <newdata>)
for prediction.
A genereric function for training Neural Networks for classification and
regression problems. Various types of activation and cost functions are
supported, as well as L1 and L2 regularization. Additional options are
early stopping, momentum and the specification of a learning rate schedule.
See function example_NN
for some visualized examples on toy data.
LeCun, Yann A., et al. "Efficient backprop." Neural networks: Tricks of the trade. Springer Berlin Heidelberg, 2012. 9-48.
# NOT RUN {
# Example on iris dataset:
randDraw <- sample(1:nrow(iris), size = 100)
train <- iris[randDraw,]
test <- iris[setdiff(1:nrow(iris), randDraw),]
plot(iris[,1:4], pch = as.numeric(iris$Species))
NN <- neuralnetwork(train[,-5], train$Species, hiddenLayers = c(5, 5),
momentum = 0.8, learnRate = 0.001)
plot(NN)
pred <- predict(NN, newdata = test[,-5])
plot(test[,-5], pch = as.numeric(test$Species),
col = as.numeric(test$Species == pred$predictions)+2)
#For other examples see function example_NN()
# }
Run the code above in your browser using DataLab