Combines softmax and CE in one op using the fused gradient (p - y) / n.
More numerically stable than chaining ag_softmax + ag_cross_entropy_loss.
Use this when your last layer outputs raw logits.
ag_softmax_cross_entropy_loss(logits, target)scalar ag_tensor
ag_tensor [classes, batch_size] raw (pre-softmax) scores
matrix [classes, batch_size] one-hot labels