Anonymous function that returns optimizer when called.
Arguments
momentum
strength of momentum
weight_decay
l2 penalty on weights
eps
epsilon
References
Defazio, A., & Jelassi, S. (2021). Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization. arXiv preprint arXiv:2101.11075.