adaptive gradient methods with dynamic bound of learning rate, see Luo et al., 2019 for details
AdaBound(
betas = c(0.9, 0.999),
final_lr = 0.1,
gamma = 0.001,
eps = 1e-08,
weight_decay = 0,
amsbound = TRUE
)
Anonymous function that returns optimizer when called.
betas
eps
small_const
eps
weight_decay
amsbound
Luo, L., Xiong, Y., Liu, Y., & Sun, X. (2019). Adaptive gradient methods with dynamic bound of learning rate. arXiv preprint arXiv:1902.09843.