accelerated stochastic gradient, see Kidambi et al., 2018 for details
AccSGD(kappa = 1000, xi = 10, small_const = 0.7, weight_decay = 0)
Anonymous function that returns optimizer when called.
long step
advantage parameter
small constant
l2 penalty on weights
Kidambi, R., Netrapalli, P., Jain, P., & Kakade, S. (2018, February). On the insufficiency of existing momentum schemes for stochastic optimization. In 2018 Information Theory and Applications Workshop (ITA) (pp. 1-9). IEEE.