By adding the GradientClip callback, the gradient norm_type (default:2) norm
is clipped to at most max_norm (default:1) using torch::nn_utils_clip_grad_norm_(),
which can avoid loss divergence.
luz_callback_gradient_clip(max_norm = 1, norm_type = 2)(float or int): max norm of the gradients
(float or int): type of the used p-norm. Can be Inf for
infinity norm.
See FastAI documentation for the GradientClip callback.