optimization_L1: optimization_L1

Description

Subgradient-based quasi-Newton method for non-differentiable optimization.

Usage

optimization_L1(w,X,y,nHidden, verbose= FALSE,lambda,lambda2, optimtol, prgmtol,
                      maxIter, decsuff)

Arguments

(numeric,\(n\)) weights and biases.

(numeric, \(n \times p\)) incidence matrix.

(numeric, \(n\)) the response data-vector.

nHidden

(positive integer, \(1\times h\)) matrix, h indicates the number of hidden-layers and nHidden[1,h] indicates the neurons of the hth hidden-layer.

verbose

logical, if TRUE prints detail history.

lambda

numeric, lagrange multiplier for L1 norm penalty on parameters.

lambda2

numeric, lagrange multiplier for L2 norm penalty on parameters.

optimtol

numeric, a tiny number useful for checking convergenge of subgradients.

prgmtol

numeric, a tiny number useful for checking convergenge of parameters of NN.

maxIter

positive integer, maximum number of epochs(iterations) to train, default 100.

decsuff

numeric, a tiny number useful for checking change of loss function.

Value

A vector of weights and biases.

Details

It is based on choosing a sub-gradient with minimum norm as a steepest descent direction and taking a step resembling Newton iteration in this direction with a Hessian approximation (Nocedal, 1980). An active-set method is adopted to set some parameters to exactly zero (Krishnan et al., 2007). At each iteration, the non-zero parameters are divided into two sets: the working set containing the non-zero variables, and the active set containing the sufficiently zero-values variables. Then a Newton step is taken along the working set. A subgradient-based quasi-Newton method ensures that the step size taken in the active variables is such that they do not violate the sufficiently zero-value variables constraint. A projected steepest descent is taken to set some parameters to exactly zero.

References

Nocedal, J. 1980. Updating quasi-newton matrices with limited storage. Mathematics of Computation, 35(35), 773-782.

Krishnan, D., Lin, P., and Yip, A., M. 2007. A primal-dual active-set method for non-negativity constrained total variation deblurring problems. IEEE Transactions on Image Processing, 16(11), 2766-2777.