Learn R Programming

ggmlR (version 0.6.1)

ggml_soft_max_ext: Extended Softmax with Masking and Scaling (Graph)

Description

Creates a graph node for fused softmax operation with optional masking and ALiBi (Attention with Linear Biases) support. Computes: softmax(a * scale + mask * (ALiBi slope)) CRITICAL for efficient attention computation in transformers.

Usage

ggml_soft_max_ext(ctx, a, mask = NULL, scale = 1, max_bias = 0)

Value

Tensor representing the scaled and masked softmax

Arguments

ctx

GGML context

a

Input tensor (typically attention scores)

mask

Optional attention mask tensor (F16 or F32). NULL for no mask. Shape must be broadcastable to input tensor.

scale

Scaling factor, typically 1/sqrt(head_dim)

max_bias

Maximum ALiBi bias (0.0 to disable ALiBi)

Details

This extended softmax is commonly used in transformer attention: 1. Scale attention scores by 1/sqrt(d_k) for numerical stability 2. Apply attention mask (e.g., causal mask, padding mask) 3. Optionally apply ALiBi position bias 4. Compute softmax

All these operations are fused for efficiency.

Examples

Run this code
# \donttest{
ctx <- ggml_init(16 * 1024 * 1024)
scores <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 10)
ggml_set_f32(scores, rnorm(100))
attn <- ggml_soft_max_ext(ctx, scores, NULL, 1.0, max_bias = 0.0)
graph <- ggml_build_forward_expand(ctx, attn)
ggml_graph_compute(ctx, graph)
ggml_free(ctx)
# }

Run the code above in your browser using DataLab