Generic CE: -sum(target * log(pred)) / batch_size.
The gradient w.r.t. pred is -target / pred / n.
Use ag_softmax_cross_entropy_loss() for the numerically stable
combined softmax + CE (fused gradient (p - y) / n).
ag_cross_entropy_loss(pred, target)scalar ag_tensor
ag_tensor [classes, batch_size] probabilities (any, not just softmax)
matrix [classes, batch_size] one-hot (or soft) labels