Adam optimizer as described in [Adam - A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980v8).
optimizer_adam(
learning_rate = 0.001,
beta_1 = 0.9,
beta_2 = 0.999,
epsilon = NULL,
decay = 0,
amsgrad = FALSE,
clipnorm = NULL,
clipvalue = NULL,
...
)
float >= 0. Learning rate.
The exponential decay rate for the 1st moment estimates. float, 0 < beta < 1. Generally close to 1.
The exponential decay rate for the 2nd moment estimates. float, 0 < beta < 1. Generally close to 1.
float >= 0. Fuzz factor. If `NULL`, defaults to `k_epsilon()`.
float >= 0. Learning rate decay over each update.
Whether to apply the AMSGrad variant of this algorithm from the paper "On the Convergence of Adam and Beyond".
Gradients will be clipped when their L2 norm exceeds this value.
Gradients will be clipped when their absolute value exceeds this value.
Unused, present only for backwards compatability
- [Adam - A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980v8) - [On the Convergence of Adam and Beyond](https://openreview.net/forum?id=ryQu7f-RZ)
Other optimizers:
optimizer_adadelta()
,
optimizer_adagrad()
,
optimizer_adamax()
,
optimizer_nadam()
,
optimizer_rmsprop()
,
optimizer_sgd()