mlp_teach_sgd: Stochastic gradient descent with (optional) RMS weights scaling, weight decay, and momentum

Description

This function implements the stochastic gradient descent method with optional modifications: L2 regularization, root mean square gradient scaling, weight decay, and momentum.

Usage

mlp_teach_sgd(net, input, output, tol_level, max_epochs, learn_rate, l2reg = 0, minibatchsz = 100, lambda = 0, gamma = 0, momentum = 0, report_freq = 0)

Arguments

net

an object of mlp_net class

input

numeric matrix, each row corresponds to one input vector number of columns must be equal to the number of neurons in the network input layer

output

numeric matrix with rows corresponding to expected outputs, number of columns must be equal to the number of neurons in the network output layer, number of rows must be equal to the number of input rows

tol_level

numeric value, error (MSE) tolerance level

max_epochs

integer value, maximal number of epochs (iterations)

learn_rate

numeric value, (initial) learning rate, depending on the problem at hand, learning rates of 0.001 or 0.01 should give satisfactory convergence

l2reg

numeric value, L2 regularization parameter (default 0)

minibatchsz

integer value, the size of the mini batch (default 100)

lambda

numeric value, rmsprop parameter controlling the update of mean squared gradient, reasonable value is 0.1 (default 0)

gamma

numeric value, weight decay parameter (default 0)

momentum

numeric value, momentum parameter, reasonable values are between 0.5 and 0.9 (default 0)

report_freq

integer value, progress report frequency, if set to 0 no information is printed on the console (this is the default)

Value

Two-element list, the first field (net) contains the trained network, the second (mse) - the learning history (MSE in consecutive epochs).