Creates a graph node for extended softmax, modifying input tensor in place. Returns a view of the input tensor.
ggml_soft_max_ext_inplace(ctx, a, mask = NULL, scale = 1, max_bias = 0)View of input tensor with softmax applied in place
GGML context
Input tensor (typically attention scores)
Optional attention mask tensor (F16 or F32). NULL for no mask. Shape must be broadcastable to input tensor.
Scaling factor, typically 1/sqrt(head_dim)
Maximum ALiBi bias (0.0 to disable ALiBi)
Other softmax:
ggml_soft_max_ext_back_inplace()