By adding the GradientClip callback, the gradient norm_type
(default:2) norm
is clipped to at most max_norm
(default:1) using torch::nn_utils_clip_grad_norm_()
,
which can avoid loss divergence.
luz_callback_gradient_clip(max_norm = 1, norm_type = 2)
(float or int): max norm of the gradients
(float or int): type of the used p-norm. Can be Inf
for
infinity norm.
See FastAI documentation for the GradientClip callback.