R implementation of the Adahessian optimizer proposed by Yao et al.(2020). The original implementation is available at https://github.com/amirgholami/adahessian.
optim_adahessian(
params,
lr = 0.15,
betas = c(0.9, 0.999),
eps = 1e-04,
weight_decay = 0,
hessian_power = 0.5
)
An optimizer object implementing the step
and zero_grad
methods.
Iterable of parameters to optimize.
Learning rate (default: 0.15).
Coefficients for computing running averages of gradient and is square(default: (0.9, 0.999)).
Term added to the denominator to improve numerical stability (default: 1e-4).
L2 penalty (default: 0).
Hessian power (default: 1.0).
Rolf Simoes, rolf.simoes@inpe.br
Felipe Souza, lipecaso@gmail.com
Alber Sanchez, alber.ipia@inpe.br
Gilberto Camara, gilberto.camara@inpe.br
Yao, Z., Gholami, A., Shen, S., Mustafa, M., Keutzer, K., & Mahoney, M. (2021). ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12), 10665-10673. https://arxiv.org/abs/2006.00719