Learn R Programming

ggmlR — Neural Networks for R

A native R package for building, training, and deploying neural networks. Backed by the ggml C library, designed primarily for Vulkan GPU acceleration with full CPU fallback — no Python, no TensorFlow, everything runs inside your R session.

GPU-first design: when a Vulkan-capable GPU is available (NVIDIA, AMD, Intel, ARM Mali, Qualcomm Adreno), all operations run on GPU automatically. On machines without a GPU the package falls back to CPU transparently — no code changes needed.

Two complementary APIs:

APIStyleWhen to use
Sequential / FunctionalKeras-like, static graphProduction models, CRAN-standard workflow
Dynamic autograd (ag_*)PyTorch-like, eagerResearch, custom architectures, Transformers

Also serves as the backend engine for llamaR (LLM inference) and sdR (Stable Diffusion).

Installation

install.packages("ggmlR")

GPU (Vulkan) support is auto-detected at build time.

Ubuntu / Debian — to enable GPU:

sudo apt install libvulkan-dev glslc

Windows — install Rtools and optionally the Vulkan SDK for GPU support.

Build options

Force-enable or disable Vulkan GPU backend:

install.packages("ggmlR", configure.args = "--with-vulkan")
install.packages("ggmlR", configure.args = "--without-vulkan")

Enable CPU SIMD acceleration (AVX2, SSE4, etc.) for faster inference on your machine:

install.packages("ggmlR", configure.args = "--with-simd")

Options can be combined:

install.packages("ggmlR", configure.args = "--with-vulkan --with-simd")

Sequential API

The fastest way to get a model running — stack layers with the pipe, compile, train.

library(ggmlR)

model <- ggml_model_sequential() |>
  ggml_layer_dense(128L, activation = "relu",    input_shape = 784L) |>
  ggml_layer_dropout(rate = 0.3) |>
  ggml_layer_dense(10L,  activation = "softmax")

model <- ggml_compile(model,
                      optimizer = "adam",
                      loss      = "categorical_crossentropy",
                      metrics   = "accuracy")

model <- ggml_fit(model, x_train, y_train,
                  epochs           = 10L,
                  batch_size       = 32L,
                  validation_split = 0.1,
                  verbose          = 1L)

plot(model$history)

ggml_evaluate(model, x_test, y_test)
preds <- ggml_predict(model, x_new)

Available layers (Sequential)

LayerFunction
Denseggml_layer_dense(units, activation)
Conv1Dggml_layer_conv_1d(filters, kernel_size)
Conv2Dggml_layer_conv_2d(filters, kernel_size, padding)
MaxPooling2Dggml_layer_max_pooling_2d(pool_size)
GlobalAvgPool2Dggml_layer_global_average_pooling_2d()
BatchNormggml_layer_batch_norm()
Flattenggml_layer_flatten()
Dropoutggml_layer_dropout(rate)
Embeddingggml_layer_embedding(vocab_size, dim)
LSTMggml_layer_lstm(units, return_sequences)
GRUggml_layer_gru(units, return_sequences)

CNN example (MNIST)

model <- ggml_model_sequential() |>
  ggml_layer_conv_2d(32L, kernel_size = c(3L, 3L), activation = "relu",
                     input_shape = c(28L, 28L, 1L)) |>
  ggml_layer_max_pooling_2d(pool_size = c(2L, 2L)) |>
  ggml_layer_conv_2d(64L, kernel_size = c(3L, 3L), activation = "relu") |>
  ggml_layer_global_average_pooling_2d() |>
  ggml_layer_dense(10L, activation = "softmax")

Functional API

Wire layers into arbitrary graphs — residual connections, multi-input/output, shared weights.

Residual (skip) connection

inp <- ggml_input(shape = 64L)
x   <- inp |> ggml_layer_dense(64L, activation = "relu")
res <- ggml_layer_add(list(inp, x))        # element-wise add
out <- res |> ggml_layer_dense(10L, activation = "softmax")

m <- ggml_model(inputs = inp, outputs = out)
m <- ggml_compile(m, optimizer = "adam", loss = "categorical_crossentropy")
m <- ggml_fit(m, x_train, y_train, epochs = 5L, batch_size = 32L)

Embedding + GRU + skip connection (NLP)

inp <- ggml_input(shape = 30L, dtype = "int32", name = "tokens")
emb <- inp |> ggml_layer_embedding(vocab_size = 500L, dim = 32L)

# Branch A: GRU path
proj_a <- emb |>
  ggml_layer_gru(32L, return_sequences = FALSE) |>
  ggml_layer_dense(32L)

# Branch B: flatten + projection
proj_b <- emb |>
  ggml_layer_flatten() |>
  ggml_layer_dense(32L, activation = "relu") |>
  ggml_layer_dense(32L)

# Residual merge
out <- ggml_layer_add(list(proj_a, proj_b)) |>
  ggml_layer_dropout(rate = 0.3) |>
  ggml_layer_dense(2L, activation = "softmax")

m <- ggml_model(inputs = inp, outputs = out)

Token values must be 0-based integers in [0, vocab_size - 1].

Multi-input model

inp1 <- ggml_input(shape = 20L, name = "timeseries")
inp2 <- ggml_input(shape = 3L,  name = "metadata")

br1 <- inp1 |> ggml_layer_dense(16L, activation = "relu")
br2 <- inp2 |> ggml_layer_dense(8L,  activation = "relu")

out <- ggml_layer_concatenate(list(br1, br2), axis = 0L) |>
  ggml_layer_dense(2L, activation = "softmax")

m <- ggml_model(inputs = list(inp1, inp2), outputs = out)
m <- ggml_compile(m, optimizer = "adam", loss = "categorical_crossentropy")

# Pass x as a list — one matrix per input
m <- ggml_fit(m, x = list(x_ts, x_meta), y = y,
              epochs = 10L, batch_size = 32L)
preds <- ggml_predict(m, list(x_ts, x_meta))

Multi-output model

inp    <- ggml_input(shape = 64L)
hidden <- inp    |> ggml_layer_dense(64L, activation = "relu")
out    <- hidden |> ggml_layer_dense(10L, activation = "softmax")

m     <- ggml_model(inputs = inp, outputs = list(hidden, out))
preds <- ggml_predict(m, x)
# preds[[1]] — hidden activations  [n × 64]
# preds[[2]] — class probabilities [n × 10]

ResNet-like image classifier

residual_block <- function(x, filters, name) {
  main     <- x |> ggml_layer_conv_2d(filters, c(3L, 3L), padding = "same",
                                       name = paste0(name, "_conv"))
  shortcut <- x |> ggml_layer_conv_2d(filters, c(1L, 1L), padding = "same",
                                       name = paste0(name, "_proj"))
  ggml_layer_add(list(main, shortcut), name = paste0(name, "_add"))
}

inp <- ggml_input(shape = c(32L, 32L, 3L))
x   <- inp |> ggml_layer_conv_2d(16L, c(3L, 3L), activation = "relu",
                                  padding = "same")
x   <- residual_block(x, 16L, "res1")
x   <- residual_block(x, 32L, "res2")
out <- x |>
  ggml_layer_global_average_pooling_2d() |>
  ggml_layer_dropout(rate = 0.4) |>
  ggml_layer_dense(3L, activation = "softmax")

m <- ggml_model(inputs = inp, outputs = out)

Shared layers (Siamese / weight sharing)

enc <- ggml_dense(32L, activation = "relu", name = "encoder")

x1 <- ggml_input(shape = 16L, name = "left")
x2 <- ggml_input(shape = 16L, name = "right")

h1 <- ggml_apply(x1, enc)   # identical weights
h2 <- ggml_apply(x2, enc)

out <- ggml_layer_add(list(h1, h2)) |>
  ggml_layer_dense(2L, activation = "softmax")

m <- ggml_model(inputs = list(x1, x2), outputs = out)

Differences from Keras

FeatureKeras (Python)ggmlR
Batch dimensionpart of input_shapeexcluded from shape
Merge layersadd([a, b])ggml_layer_add(list(a, b))
Shared layersreuse layer objectggml_dense() + ggml_apply()
Multi-input datalist of arrayslist() of R matrices
Multi-output predictlist of numpy arraysR list of matrices
BackendTensorFlow / JAX / PyTorchggml (Vulkan GPU, CPU fallback)

Dynamic Autograd Engine (PyTorch-style)

Build and train arbitrary architectures with eager execution and automatic differentiation.

library(ggmlR)

# Define parameters
W <- ag_param(matrix(rnorm(4 * 8) * 0.1, 8, 4))
b <- ag_param(matrix(0, 8, 1))

# Forward + backward
with_grad_tape({
  h    <- ag_add(ag_matmul(W, x_batch), b)
  h    <- ag_relu(h)
  loss <- ag_mse_loss(h, y_batch)
})
grads <- backward(loss)

opt <- optimizer_adam(list(W = W, b = b), lr = 1e-3)
opt$step(grads)
opt$zero_grad()

Transformer encoder block

model <- ag_sequential(
  ag_linear(64L, 128L, activation = "relu"),
  ag_batch_norm(128L),
  ag_dropout(0.1),
  ag_linear(128L, 10L)
)

params <- model$parameters()
opt    <- optimizer_adam(params, lr = 1e-3)
sch    <- lr_scheduler_cosine(opt, T_max = 50L, lr_min = 1e-5)

dl <- ag_dataloader(x_train, y_train, batch_size = 32L, shuffle = TRUE)

for (epoch in 1:50) {
  for (batch in dl$epoch()) {
    with_grad_tape({
      out  <- model$forward(batch$x)
      loss <- ag_softmax_cross_entropy_loss(out, batch$y)
    })
    grads <- backward(loss)
    clip_grad_norm(params, grads, max_norm = 1.0)
    opt$step(grads)
    opt$zero_grad()
  }
  sch$step()
}

Data-parallel training

dp_train() splits data across N replicas, averages gradients, and keeps weights in sync automatically.

make_model <- function() {
  W <- ag_param(matrix(rnorm(4 * 2) * 0.1, 2, 4))
  b <- ag_param(matrix(0, 2, 1))
  list(
    forward    = function(x) ag_add(ag_matmul(W, x), b),
    parameters = function() list(W = W, b = b)
  )
}

result <- dp_train(
  make_model  = make_model,
  data        = my_dataset,           # list of samples
  loss_fn     = function(out, tgt) ag_mse_loss(out, tgt),
  forward_fn  = function(model, s) model$forward(s$x),
  target_fn   = function(s) s$y,
  n_gpu       = 2L,                   # number of replicas
  n_iter      = 100L,
  lr          = 1e-3,
  max_norm    = 5.0
)

result$loss_history   # numeric vector, one value per iteration
result$model          # trained replica 0

Autograd op reference

CategoryFunctions
Linearag_matmul, ag_add, ag_sub, ag_mul, ag_scale
Activationsag_relu, ag_sigmoid, ag_tanh, ag_softmax
Reductionsag_sum, ag_mean (with dim, keepdim)
Mathag_log, ag_exp, ag_pow, ag_clamp
Shapeag_reshape, ag_transpose
Attentionag_multihead_attention
Lossag_mse_loss, ag_cross_entropy_loss, ag_softmax_cross_entropy_loss
Layersag_linear, ag_batch_norm, ag_dropout, ag_embedding
Containersag_sequential
Optimizersoptimizer_sgd, optimizer_adam
Schedulerslr_scheduler_step, lr_scheduler_cosine
Utilitiesclip_grad_norm, ag_gradcheck, dp_train

Save / Load

ggml_save_model(model, "my_model.rds")
model <- ggml_load_model("my_model.rds")

GPU Acceleration

ggmlR is designed GPU-first: Vulkan is auto-detected at build time and, when available, 90%+ of operations run on GPU with 5×–20× speedup over CPU. On machines without a Vulkan-capable GPU the package falls back to CPU transparently — no code changes required.

ggml_vulkan_available()   # TRUE if a Vulkan GPU was detected
ggml_vulkan_status()      # device name, memory, capabilities

# Dynamic autograd: switch device at runtime
ag_device("gpu")   # move subsequent ops to GPU (f16 by default)
ag_device("cpu")   # fall back to CPU

Supported GPUs: NVIDIA, AMD, Intel, ARM Mali, Qualcomm Adreno.

System Requirements

  • R ≥ 4.1.0, C++17 compiler
  • Optional GPU: libvulkan-dev + glslc (Linux) or Vulkan SDK (Windows)
  • Platforms: Linux, macOS, Windows (x86-64, ARM64)

See Also

  • llamaR — LLM inference in R
  • sdR — Stable Diffusion in R
  • ggml — underlying C library

License

MIT

Copy Link

Version

Install

install.packages('ggmlR')

Version

0.6.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Yuri Baramykov

Last Published

February 22nd, 2026

Functions in ggmlR (0.6.1)

ag_dataloader

Create a mini-batch data loader
ag_cross_entropy_loss

Categorical Cross-Entropy loss
GGML_GLU_OP_REGLU

GLU Operation Types
ag_add

Element-wise addition with broadcasting
ag_default_device

Return the current default compute device
GGML_SORT_ORDER_ASC

Sort Order Constants
abind_first

Bind Two Arrays Along the First Dimension
GGML_TYPE_F32

GGML Data Types
ag_clamp

Element-wise clamp
ag_batch_norm

Create a Batch Normalisation layer
ag_exp

Element-wise exponential
ag_embedding

Create an Embedding layer
ag_linear

Create a dense layer with learnable parameters
ag_default_dtype

Return the current default dtype for GPU operations
ag_eval

Switch a layer or sequential model to eval mode
ag_log

Element-wise natural logarithm
ag_gradcheck

Numerical gradient check (like torch.autograd.gradcheck)
ag_device

Set the default compute device for ag_* operations
ag_dtype

Set the default floating-point precision for ag_* GPU operations
ag_dropout

Create a Dropout layer
ag_matmul

Matrix multiplication
ag_multihead_attention

Create a Multi-Head Attention layer
ag_param

Create a parameter tensor (gradient tracked)
ag_mul

Element-wise multiplication
ag_reshape

Reshape tensor
ag_scale

Scale tensor by a scalar constant
ag_relu

ReLU activation
ag_tensor

Create a dynamic tensor (no gradient tracking)
ag_sequential

Create a sequential container of layers
ag_sigmoid

Sigmoid activation
ag_train

Switch a layer or sequential model to training mode
ag_to_device

Move a tensor to the specified device
ag_softmax

Softmax activation (column-wise)
ag_sub

Element-wise subtraction
ag_softmax_cross_entropy_loss

Fused softmax + cross-entropy loss (numerically stable)
ag_mse_loss

Mean Squared Error loss
ag_pow

Element-wise power
ag_mean

Mean of elements (or along a dim)
ag_sum

Sum all elements (or along a dim): out = sum(x)
ag_tanh

Tanh activation
dequantize_row_q2_K

Dequantize Row (K-quants)
dp_train

Data-parallel training across multiple GPUs
dequantize_row_mxfp4

Dequantize Row (MXFP4)
ag_transpose

Transpose a tensor
backward

Run backward pass from a scalar loss tensor
dequantize_row_q4_0

Dequantize Row (Q4_0)
dequantize_row_tq1_0

Dequantize Row (Ternary)
ggmlR-package

ggmlR: 'GGML' Tensor Operations for Machine Learning
clip_grad_norm

Clip gradients by global L2 norm
ggml_add

Add tensors
ggml_add1

Add Scalar to Tensor (Graph)
ggml_apply

Apply a Layer Object to a Tensor Node
ggml_abort_is_r_enabled

Check if R Abort Handler is Enabled
ggml_are_same_layout

Check if Two Tensors Have the Same Layout
ggml_add_inplace

Element-wise Addition In-place (Graph)
ggml_are_same_shape

Compare Tensor Shapes
ggml_abs_inplace

Absolute Value In-place (Graph)
ggml_are_same_stride

Compare Tensor Strides
dequantize_row_iq2_xxs

Dequantize Row (IQ)
ggml_abs

Absolute Value (Graph)
ggml_backend_buffer_get_usage

Get buffer usage
ggml_backend_buffer_free

Free Backend Buffer
ggml_argmax

Argmax (Graph)
ggml_argsort

Argsort - Get Sorting Indices (Graph)
ggml_backend_buffer_is_multi_buffer

Check if buffer is a multi-buffer
ggml_backend_buffer_is_host

Check if buffer is host memory
ggml_backend_buffer_name

Get Backend Buffer Name
ggml_backend_alloc_ctx_tensors

Allocate Context Tensors to Backend
ggml_backend_buffer_clear

Clear buffer memory
ggml_backend_cpu_set_n_threads

Set CPU Backend Threads
ggml_backend_buffer_get_size

Get Backend Buffer Size
ggml_backend_buffer_usage_compute

Buffer usage: Compute
ggml_backend_buffer_usage_any

Buffer usage: Any
ggml_backend_dev_count

Get number of available devices
ggml_backend_buffer_reset

Reset buffer
ggml_backend_buffer_set_usage

Set buffer usage hint
ggml_backend_cpu_init

Initialize CPU Backend
ggml_backend_buffer_usage_weights

Buffer usage: Weights
ggml_backend_dev_by_type

Get device by type
ggml_backend_dev_by_name

Get device by name
ggml_backend_dev_get

Get device by index
ggml_backend_dev_supports_buft

Check if device supports buffer type
ggml_backend_dev_type

Get device type
ggml_backend_dev_init

Initialize backend from device
ggml_backend_dev_description

Get device description
ggml_backend_dev_offload_op

Check if device should offload operation
ggml_backend_dev_get_props

Get device properties
ggml_backend_dev_memory

Get device memory
ggml_backend_dev_name

Get device name
ggml_backend_dev_supports_op

Check if device supports operation
ggml_backend_event_record

Record event
ggml_backend_event_synchronize

Synchronize event
ggml_backend_device_type_cpu

Device type: CPU
ggml_backend_device_type_accel

Device type: Accelerator
ggml_backend_device_type_igpu

Device type: Integrated GPU
ggml_backend_event_new

Create new event
ggml_backend_event_free

Free event
ggml_backend_device_register

Register a device
ggml_backend_event_wait

Wait for event
ggml_backend_device_type_gpu

Device type: GPU
ggml_backend_graph_plan_compute

Execute graph plan
ggml_backend_init_by_type

Initialize backend by type
ggml_backend_graph_plan_create

Create graph execution plan
ggml_backend_init_best

Initialize best available backend
ggml_backend_get_device

Get device from backend
ggml_backend_init_by_name

Initialize backend by name
ggml_backend_graph_compute

Compute Graph with Backend
ggml_backend_free

Free Backend
ggml_backend_graph_compute_async

Compute graph asynchronously
ggml_backend_graph_plan_free

Free graph execution plan
ggml_backend_name

Get Backend Name
ggml_backend_reg_by_name

Get backend registry by name
ggml_backend_load

Load backend from dynamic library
ggml_backend_multi_buffer_set_usage

Set usage for all buffers in a multi-buffer
ggml_backend_reg_dev_get

Get device from registry
ggml_backend_reg_dev_count

Get number of devices in registry
ggml_backend_load_all

Load all available backends
ggml_backend_reg_get

Get backend registry by index
ggml_backend_reg_count

Get number of registered backends
ggml_backend_multi_buffer_alloc_buffer

Allocate multi-buffer
ggml_backend_sched_get_n_copies

Get number of tensor copies
ggml_backend_sched_get_tensor_backend

Get tensor backend assignment
ggml_backend_sched_get_n_backends

Get number of backends in scheduler
ggml_backend_sched_get_n_splits

Get number of graph splits
ggml_backend_sched_free

Free backend scheduler
ggml_backend_sched_get_backend

Get backend from scheduler
ggml_backend_sched_alloc_graph

Allocate graph on scheduler
ggml_backend_reg_name

Get registry name
ggml_backend_register

Register a backend
ggml_backend_sched_graph_compute

Compute graph using scheduler
ggml_backend_tensor_get_and_sync

Backend Tensor Get and Sync
ggml_backend_sched_reserve

Reserve memory for scheduler
ggml_backend_synchronize

Synchronize backend
ggml_backend_sched_set_tensor_backend

Set tensor backend assignment
ggml_backend_tensor_get_async

Get tensor data asynchronously
ggml_backend_sched_graph_compute_async

Compute graph asynchronously
ggml_backend_sched_synchronize

Synchronize scheduler
ggml_backend_sched_reset

Reset scheduler
ggml_backend_sched_new

Create a new backend scheduler
ggml_backend_tensor_copy_async

Copy tensor asynchronously between backends
ggml_backend_tensor_get_data

Get Tensor Data via Backend
ggml_backend_tensor_get_f32_first

Get First Float from Backend Tensor
ggml_backend_tensor_set_async

Set tensor data asynchronously
ggml_can_repeat

Check If Tensor Can Be Repeated
ggml_callback_early_stopping

Early stopping callback
ggml_backend_tensor_set_data

Set Tensor Data via Backend
ggml_backend_unload

Unload backend
ggml_batch_norm

Create a Batch Normalization Layer Object
ggml_ceil

Ceiling (Graph)
ggml_ceil_inplace

Ceiling In-place (Graph)
ggml_cos

Cosine (Graph)
ggml_conv_transpose_1d

Transposed 1D Convolution (Graph)
ggml_concat

Concatenate Tensors (Graph)
ggml_cont

Make Contiguous (Graph)
ggml_blck_size

Get Block Size
ggml_cpu_has_avx2

CPU Feature Detection - AVX2
ggml_compile.ggml_functional_model

Compile a Sequential Model
ggml_build_forward_expand

Build forward expand
ggml_clamp

Clamp (Graph)
ggml_cpu_features

Get All CPU Features
ggml_count_equal

Count Equal Elements (Graph)
ggml_conv_1d

1D Convolution (Graph)
ggml_cpu_get_sve_cnt

Get SVE Vector Length (ARM)
ggml_cpu_has_amx_int8

CPU Feature Detection - AMX INT8
ggml_cpu_has_avx512

CPU Feature Detection - AVX-512
ggml_cpu_has_avx

CPU Feature Detection - AVX
ggml_cpu_has_arm_fma

CPU Feature Detection - ARM FMA
ggml_conv_2d

2D Convolution (Graph)
ggml_cpu_get_rvv_vlen

Get RISC-V Vector Length
ggml_cpu_has_fma

CPU Feature Detection - FMA
ggml_cpu_has_avx512_vnni

CPU Feature Detection - AVX-512 VNNI
ggml_cpu_has_f16c

CPU Feature Detection - F16C
ggml_cpu_add

Element-wise Addition (CPU Direct)
ggml_cpu_has_bmi2

CPU Feature Detection - BMI2
ggml_cpu_has_avx_vnni

CPU Feature Detection - AVX-VNNI
ggml_cpu_has_dotprod

CPU Feature Detection - Dot Product (ARM)
ggml_cpu_has_llamafile

CPU Feature Detection - Llamafile
ggml_cpu_has_fp16_va

CPU Feature Detection - FP16 Vector Arithmetic (ARM)
ggml_cpu_has_avx512_vbmi

CPU Feature Detection - AVX-512 VBMI
ggml_cpu_has_avx512_bf16

CPU Feature Detection - AVX-512 BF16
ggml_cpu_has_sse3

CPU Feature Detection - SSE3
ggml_cpu_has_riscv_v

CPU Feature Detection - RISC-V Vector
ggml_cpu_has_sve

CPU Feature Detection - SVE (ARM)
ggml_cpu_has_sme

CPU Feature Detection - SME (ARM)
ggml_cpu_has_vsx

CPU Feature Detection - VSX (PowerPC)
ggml_cpu_has_ssse3

CPU Feature Detection - SSSE3
ggml_cpu_has_matmul_int8

CPU Feature Detection - INT8 Matrix Multiply (ARM)
ggml_cpu_has_neon

CPU Feature Detection - NEON (ARM)
ggml_cpu_has_vxe

CPU Feature Detection - VXE (IBM z/Architecture)
ggml_cpu_has_wasm_simd

CPU Feature Detection - WebAssembly SIMD
ggml_cycles

Get CPU Cycles
ggml_diag_mask_inf

Diagonal Mask with -Inf (Graph)
ggml_dense

Create a Dense Layer Object
ggml_diag_mask_inf_inplace

Diagonal Mask with -Inf In-place (Graph)
ggml_cycles_per_ms

Get CPU Cycles per Millisecond
ggml_diag

Diagonal Matrix (Graph)
ggml_cpy

Copy Tensor with Type Conversion (Graph)
ggml_cpu_mul

Element-wise Multiplication (CPU Direct)
ggml_diag_mask_zero

Diagonal Mask with Zero (Graph)
ggml_div

Element-wise Division (Graph)
ggml_dup_inplace

Duplicate Tensor In-place (Graph)
ggml_elu

ELU Activation (Graph)
ggml_dup_tensor

Duplicate Tensor
ggml_estimate_memory

Estimate Required Memory
ggml_div_inplace

Element-wise Division In-place (Graph)
ggml_dup

Duplicate Tensor (Graph)
ggml_evaluate.ggml_functional_model

Evaluate a Trained Model
ggml_elu_inplace

ELU Activation In-place (Graph)
ggml_element_size

Get Element Size
ggml_embedding

Create an Embedding Layer Object
ggml_flash_attn_back

Flash Attention Backward (Graph)
ggml_flash_attn_ext

Flash Attention (Graph)
ggml_fit.ggml_functional_model

Train a Model (dispatcher)
ggml_fit_opt

Fit model with R-side epoch loop and callbacks
ggml_exp_inplace

Exponential In-place (Graph)
ggml_floor_inplace

Floor In-place (Graph)
ggml_free

Free GGML context
ggml_freeze_weights

Freeze Layer Weights
ggml_exp

Exponential (Graph)
ggml_floor

Floor (Graph)
ggml_gallocr_reserve

Reserve Memory for Graph
ggml_ftype_to_ggml_type

Convert ftype to ggml_type
ggml_geglu_split

GeGLU Split (Graph)
ggml_gallocr_new

Create Graph Allocator
ggml_geglu

GeGLU (GELU Gated Linear Unit) (Graph)
ggml_gallocr_get_buffer_size

Get Graph Allocator Buffer Size
ggml_gallocr_free

Free Graph Allocator
ggml_gelu

GELU Activation (Graph)
ggml_geglu_quick

GeGLU Quick (Fast GeGLU) (Graph)
ggml_gallocr_alloc_graph

Allocate Memory for Graph
ggml_gelu_inplace

GELU Activation In-place (Graph)
ggml_get_i32

Get I32 Data
ggml_get_f32

Get F32 data
ggml_gelu_quick

GELU Quick Activation (Graph)
ggml_get_f32_nd

Get Single Float Value by N-D Index
ggml_gelu_erf

Exact GELU Activation (Graph)
ggml_get_layer

Get a Layer from a Sequential Model
ggml_get_max_tensor_size

Get Maximum Tensor Size
ggml_get_first_tensor

Get First Tensor from Context
ggml_get_i32_nd

Get Single Int32 Value by N-D Index
ggml_get_no_alloc

Get No Allocation Mode
ggml_get_rows

Get Rows by Indices (Graph)
ggml_get_op_params

Get Tensor Operation Parameters
ggml_get_op_params_f32

Get Float Op Parameter
ggml_get_rows_back

Get Rows Backward (Graph)
ggml_get_mem_size

Get Context Memory Size
ggml_get_n_threads

Get Number of Threads
ggml_get_op_params_i32

Get Integer Op Parameter
ggml_get_name

Get Tensor Name
ggml_get_next_tensor

Get Next Tensor from Context
ggml_graph_n_nodes

Get Number of Nodes in Graph
ggml_get_unary_op

Get Unary Operation from Tensor
ggml_glu

Generic GLU (Gated Linear Unit) (Graph)
ggml_graph_overhead

Get Graph Overhead
ggml_graph_node

Get Graph Node
ggml_graph_get_tensor

Get Tensor from Graph by Name
ggml_graph_dump_dot

Export Graph to DOT Format
ggml_graph_compute_with_ctx

Compute Graph with Context (Alternative Method)
ggml_glu_split

Generic GLU Split (Graph)
ggml_graph_view

Create a View of a Subgraph
ggml_gru

Create a GRU Layer Object
ggml_hardswish

Hard Swish Activation (Graph)
ggml_hardsigmoid

Hard Sigmoid Activation (Graph)
ggml_group_norm_inplace

Group Normalization In-place (Graph)
ggml_group_norm

Group Normalization (Graph)
ggml_im2col

Image to Column (Graph)
ggml_init

Initialize GGML context
ggml_graph_print

Print Graph Information
ggml_graph_reset

Reset Graph (for backpropagation)
ggml_graph_compute

Compute graph
ggml_is_contiguous_rows

Check Row-wise Contiguity
ggml_is_contiguous_0

Check Tensor Contiguity (Dimension 0)
ggml_is_contiguous

Check if Tensor is Contiguous
ggml_is_contiguously_allocated

Check If Tensor is Contiguously Allocated
ggml_is_contiguous_1

Check Tensor Contiguity (Dimensions >= 1)
ggml_input

Declare a Functional API Input Tensor
ggml_is_contiguous_channels

Check Channel-wise Contiguity
ggml_is_contiguous_2

Check Tensor Contiguity (Dimensions >= 2)
ggml_init_auto

Create Context with Auto-sizing
ggml_is_permuted

Check if Tensor is Permuted
ggml_is_quantized

Check If Type is Quantized
ggml_is_available

Check if GGML is available
ggml_layer_conv_1d

Create a Conv1D Layer Object
ggml_layer_add

Element-wise Addition of Two Tensor Nodes
ggml_layer_batch_norm

Add Batch Normalization Layer
ggml_is_transposed

Check if Tensor is Transposed
ggml_l2_norm

L2 Normalization (Graph)
ggml_layer_concatenate

Concatenate Tensor Nodes Along an Axis
ggml_layer_conv_2d

Create a Conv2D Layer Object
ggml_l2_norm_inplace

L2 Normalization In-place (Graph)
ggml_layer_global_max_pooling_2d

Global Max Pooling for 2D Feature Maps
ggml_layer_global_average_pooling_2d

Global Average Pooling for 2D Feature Maps
ggml_layer_max_pooling_2d

Add 2D Max Pooling Layer
ggml_layer_lstm

Add an LSTM Layer
ggml_leaky_relu

Leaky ReLU Activation (Graph)
ggml_layer_gru

Add a GRU Layer
ggml_layer_embedding

Add Embedding Layer
ggml_layer_dense

Add Dense (Fully Connected) Layer
ggml_layer_dropout

Add Dropout Layer
ggml_log

Natural Logarithm (Graph)
ggml_load_model

Load a Full Model (Architecture + Weights)
ggml_log_set_r

Enable R-compatible GGML Logging
ggml_log_inplace

Natural Logarithm In-place (Graph)
ggml_log_is_r_enabled

Check if R Logging is Enabled
ggml_lstm

Create an LSTM Layer Object
ggml_log_set_default

Restore Default GGML Logging
ggml_load_weights

Load Model Weights from File
ggml_layer_flatten

Add Flatten Layer
ggml_model

Create a Functional Model
ggml_mean

Mean (Graph)
ggml_neg

Negation (Graph)
ggml_mul_mat

Matrix Multiplication (Graph)
ggml_neg_inplace

Negation In-place (Graph)
ggml_model_sequential

Create a Sequential Neural Network Model
ggml_nelements

Get number of elements
ggml_nbytes

Get number of bytes
ggml_mul

Multiply tensors
ggml_n_dims

Get Number of Dimensions
ggml_mul_mat_id

Matrix Multiplication with Expert Selection (Graph)
ggml_mul_inplace

Element-wise Multiplication In-place (Graph)
ggml_new_tensor_3d

Create 3D Tensor
ggml_norm

Layer Normalization (Graph)
ggml_nrows

Get Number of Rows
ggml_new_i32

Create Scalar I32 Tensor
ggml_new_tensor_1d

Create 1D tensor
ggml_new_f32

Create Scalar F32 Tensor
ggml_new_tensor

Create Tensor with Arbitrary Dimensions
ggml_new_tensor_2d

Create 2D tensor
ggml_new_tensor_4d

Create 4D Tensor
ggml_norm_inplace

Layer Normalization In-place (Graph)
ggml_op_can_inplace

Check if Operation Can Be Done In-place
ggml_opt_dataset_init

Create a new optimization dataset
ggml_op_name

Get Operation Name
ggml_opt_alloc

Allocate graph for evaluation
ggml_opt_dataset_get_batch

Get batch from dataset
ggml_op_symbol

Get Operation Symbol
ggml_opt_context_optimizer_type

Get optimizer type from context
ggml_opt_dataset_data

Get data tensor from dataset
ggml_op_desc

Get Operation Description from Tensor
ggml_opt_dataset_free

Free optimization dataset
ggml_opt_eval

Evaluate model
ggml_opt_grad_acc

Get gradient accumulator for a tensor
ggml_opt_fit

Fit model to dataset
ggml_opt_dataset_labels

Get labels tensor from dataset
ggml_opt_epoch

Run one training epoch
ggml_opt_default_params

Get default optimizer parameters
ggml_opt_dataset_shuffle

Shuffle dataset
ggml_opt_get_lr

Get current learning rate from optimizer context
ggml_opt_free

Free optimizer context
ggml_opt_dataset_ndata

Get number of datapoints in dataset
ggml_opt_loss_type_mse

Loss type: Mean Squared Error
ggml_opt_loss_type_sum

Loss type: Sum
ggml_opt_loss

Get loss tensor from optimizer context
ggml_opt_loss_type_mean

Loss type: Mean
ggml_opt_labels

Get labels tensor from optimizer context
ggml_opt_inputs

Get inputs tensor from optimizer context
ggml_opt_init

Initialize optimizer context
ggml_opt_ncorrect

Get number of correct predictions tensor
ggml_opt_loss_type_cross_entropy

Loss type: Cross Entropy
ggml_opt_init_for_fit

Initialize optimizer context for R-side epoch loop
ggml_opt_optimizer_type_adamw

Optimizer type: AdamW
ggml_opt_reset

Reset optimizer context
ggml_opt_optimizer_name

Get optimizer name
ggml_opt_pred

Get predictions tensor from optimizer context
ggml_opt_optimizer_type_sgd

Optimizer type: SGD
ggml_opt_result_free

Free optimization result
ggml_opt_prepare_alloc

Prepare allocation for non-static graphs
ggml_opt_outputs

Get outputs tensor from optimizer context
ggml_opt_result_init

Initialize optimization result
ggml_opt_result_accuracy

Get accuracy from result
ggml_opt_result_loss

Get loss from result
ggml_opt_static_graphs

Check if using static graphs
ggml_opt_set_lr

Set learning rate in optimizer context
ggml_opt_result_ndata

Get number of datapoints from result
ggml_permute

Permute Tensor Dimensions (Graph)
ggml_pool_1d

1D Pooling (Graph)
ggml_opt_result_pred

Get predictions from result
ggml_out_prod

Outer Product (Graph)
ggml_opt_result_reset

Reset optimization result
ggml_pad

Pad Tensor with Zeros (Graph)
ggml_predict.ggml_functional_model

Get Predictions from a Trained Model
ggml_quantize_init

Initialize Quantization Tables
ggml_print_objects

Print Objects in Context
ggml_pop_layer

Remove the Last Layer from a Sequential Model
ggml_print_mem_status

Print Context Memory Status
ggml_quantize_chunk

Quantize Data Chunk
ggml_predict_classes

Predict Classes from a Trained Model
ggml_pool_2d

2D Pooling (Graph)
ggml_quant_block_info

Get Quantization Block Info
ggml_reglu_split

ReGLU Split (Graph)
ggml_repeat

Repeat (Graph)
ggml_reshape_1d

Reshape to 1D (Graph)
ggml_relu_inplace

ReLU Activation In-place (Graph)
ggml_quantize_free

Free Quantization Resources
ggml_repeat_back

Repeat Backward (Graph)
ggml_reglu

ReGLU (ReLU Gated Linear Unit) (Graph)
ggml_reset

Reset GGML Context
ggml_quantize_requires_imatrix

Check if Quantization Requires Importance Matrix
ggml_relu

ReLU Activation (Graph)
ggml_reshape_2d

Reshape to 2D (Graph)
ggml_rms_norm

RMS Normalization (Graph)
ggml_rms_norm_back

RMS Norm Backward (Graph)
ggml_rope_ext_back

RoPE Extended Backward (Graph)
ggml_rms_norm_inplace

RMS Normalization In-place (Graph)
ggml_rope_inplace

Rotary Position Embedding In-place (Graph)
ggml_reshape_4d

Reshape to 4D (Graph)
ggml_reshape_3d

Reshape to 3D (Graph)
ggml_rope_ext_inplace

Extended RoPE Inplace (Graph)
ggml_rope_ext

Extended RoPE with Frequency Scaling (Graph)
ggml_rope

Rotary Position Embedding (Graph)
ggml_scale_inplace

Scale Tensor In-place (Graph)
ggml_round_inplace

Round In-place (Graph)
ggml_schedule_reduce_on_plateau

Reduce on plateau LR scheduler
ggml_scale

Scale (Graph)
ggml_round

Round (Graph)
ggml_schedule_cosine_decay

Cosine annealing LR scheduler
ggml_rope_multi

Multi-RoPE for Vision Models (Graph)
ggml_save_weights

Save Model Weights to File
ggml_rope_multi_inplace

Multi-RoPE Inplace (Graph)
ggml_set_f32_nd

Set Single Float Value by N-D Index
ggml_save_model

Save a Full Model (Architecture + Weights)
ggml_set_abort_callback_r

Enable R-compatible Abort Handling
ggml_set_i32_nd

Set Single Int32 Value by N-D Index
ggml_set_1d

Set 1D Tensor Region (Graph)
ggml_set_i32

Set I32 Data
ggml_set_abort_callback_default

Restore Default Abort Behavior
ggml_set_2d

Set 2D Tensor Region (Graph)
ggml_set

Set Tensor Region (Graph)
ggml_set_f32

Set F32 data
ggml_schedule_step_decay

Step decay LR scheduler
ggml_set_op_params_i32

Set Integer Op Parameter
ggml_set_input

Mark Tensor as Input
ggml_set_name

Set Tensor Name
ggml_set_op_params

Set Tensor Operation Parameters
ggml_set_zero

Set Tensor to Zero
ggml_set_param

Set Tensor as Trainable Parameter
ggml_set_output

Mark Tensor as Output
ggml_set_n_threads

Set Number of Threads
ggml_set_op_params_f32

Set Float Op Parameter
ggml_set_no_alloc

Set No Allocation Mode
ggml_silu_back

SiLU Backward (Graph)
ggml_soft_max_ext

Extended Softmax with Masking and Scaling (Graph)
ggml_sigmoid_inplace

Sigmoid Activation In-place (Graph)
ggml_soft_max_ext_back

Softmax Backward Extended (Graph)
ggml_silu_inplace

SiLU Activation In-place (Graph)
ggml_sgn

Sign Function (Graph)
ggml_sigmoid

Sigmoid Activation (Graph)
ggml_silu

SiLU Activation (Graph)
ggml_soft_max

Softmax (Graph)
ggml_sin

Sine (Graph)
ggml_soft_max_inplace

Softmax In-place (Graph)
ggml_softplus

Softplus Activation (Graph)
ggml_sqrt

Square Root (Graph)
ggml_sqr

Square (Graph)
ggml_sqr_inplace

Square In-place (Graph)
ggml_softplus_inplace

Softplus Activation In-place (Graph)
ggml_sqrt_inplace

Square Root In-place (Graph)
ggml_soft_max_ext_inplace

Extended Softmax Inplace (Graph)
ggml_soft_max_ext_back_inplace

Extended Softmax Backward Inplace (Graph)
ggml_step

Step Function (Graph)
ggml_tanh_inplace

Tanh Activation In-place (Graph)
ggml_tanh

Tanh Activation (Graph)
ggml_sum

Sum (Graph)
ggml_tensor_copy

Copy Tensor Data
ggml_sum_rows

Sum Rows (Graph)
ggml_swiglu

SwiGLU (Swish/SiLU Gated Linear Unit) (Graph)
ggml_tensor_nb

Get Tensor Strides (nb)
ggml_swiglu_split

SwiGLU Split (Graph)
ggml_sub

Element-wise Subtraction (Graph)
ggml_sub_inplace

Element-wise Subtraction In-place (Graph)
ggml_test

Test GGML
ggml_time_ms

Get Time in Milliseconds
ggml_tensor_shape

Get Tensor Shape
ggml_tensor_num

Count Tensors in Context
ggml_time_init

Initialize GGML Timer
ggml_tensor_overhead

Get Tensor Overhead
ggml_time_us

Get Time in Microseconds
ggml_tensor_set_f32_scalar

Fill Tensor with Scalar
ggml_tensor_type

Get Tensor Type
ggml_timestep_embedding

Timestep Embedding (Graph Operation)
ggml_upscale

Upscale Tensor (Graph)
ggml_version

Get GGML version
ggml_used_mem

Get Used Memory
ggml_transpose

Transpose (Graph)
ggml_type_name

Get Type Name
ggml_unary_op_name

Get Unary Operation Name
ggml_type_size

Get Type Size in Bytes
ggml_top_k

Top-K Indices (Graph)
ggml_unfreeze_weights

Unfreeze Layer Weights
ggml_type_sizef

Get Type Size as Float
ggml_view_3d

3D View with Byte Offset (Graph)
ggml_view_2d

2D View with Byte Offset (Graph)
ggml_view_1d

1D View with Byte Offset (Graph)
ggml_vulkan_available

Check if Vulkan support is available
ggml_with_temp_ctx

Execute with Temporary Context
ggml_vulkan_status

Print Vulkan status
ggml_vulkan_is_backend

Check if backend is Vulkan
iq3xs_free_impl

Free IQ3 Quantization Tables
ggml_view_4d

4D View with Byte Offset (Graph)
ggml_vulkan_backend_name

Get Vulkan backend name
iq2xs_init_impl

Initialize IQ2 Quantization Tables
ggml_vulkan_list_devices

List all Vulkan devices
ggml_vulkan_free

Free Vulkan backend
ggml_vulkan_init

Initialize Vulkan backend
iq3xs_init_impl

Initialize IQ3 Quantization Tables
iq2xs_free_impl

Free IQ2 Quantization Tables
ggml_vulkan_device_count

Get number of Vulkan devices
nn_build_embedding

Build embedding forward pass
ggml_vulkan_device_description

Get Vulkan device description
ggml_view_tensor

View Tensor
ggml_vulkan_device_memory

Get Vulkan device memory
lr_scheduler_step

Step-decay learning rate scheduler
lr_scheduler_cosine

Cosine-annealing learning rate scheduler
nn_build_dense

Build dense forward pass
nn_build_conv_2d

Build conv_2d forward pass
is_ag_tensor

Check if object is an ag_tensor
nn_build_max_pooling_2d

Build max_pooling_2d forward pass
nn_build_functional_graph

Build ggml computation graph for a functional model
nn_build_dropout

Build dropout forward pass
nn_apply_activation

Apply activation function
nn_build_batch_norm

Build batch_norm forward pass
nn_build_conv_1d

Build conv_1d forward pass
nn_build_gru

Build GRU forward pass for Sequential model
nn_build_lstm

Build LSTM forward pass for Sequential model
nn_build_layer

Build a layer's forward pass
nn_build_flatten

Build flatten forward pass
nn_build_global_average_pooling_2d

Build global_average_pooling_2d forward pass
nn_build_functional_node

Build a single ggml tensor for one functional node
nn_count_layer_params

Count parameters for a single layer
nn_build_graph

Build computation graph with allocated weights and inputs
nn_init_glorot_uniform

Initialize weight tensor with Glorot uniform distribution
nn_build_global_max_pooling_2d

Build global_max_pooling_2d forward pass
nn_topo_sort

Topologically sort nodes reachable from output nodes
nn_functional_output_shape

Infer output shape of a functional node given its parent shapes
nn_lstm_step

Build one LSTM step
nn_infer_shapes

Infer shapes for all layers in model
nn_gru_step

Build one GRU step
print.ag_tensor

Print method for ag_tensor
quantize_mxfp4

Quantize Data (MXFP4)
optimizer_sgd

Create an SGD optimizer
quantize_q2_K

Quantize Data (K-quants)
print.ggml_sequential_model

Print method for ggml_sequential_model
plot.ggml_history

Plot training history
quantize_iq2_xxs

Quantize Data (IQ)
optimizer_adam

Create an Adam optimizer
print.ggml_functional_model

Print method for ggml_functional_model
print.ggml_history

Print method for ggml_history
quantize_row_q4_0_ref

Quantize Row Reference (Basic)
quantize_row_mxfp4_ref

Quantize Row Reference (MXFP4)
quantize_q4_0

Quantize Data (Q4_0)
with_grad_tape

Run code with gradient tape enabled
quantize_tq1_0

Quantize Data (Ternary)
nn_init_he_uniform

Initialize weight tensor with He uniform distribution
nn_init_recurrent_uniform

Initialize recurrent weight tensor with small deterministic values
quantize_row_tq1_0_ref

Quantize Row Reference (Ternary)
quantize_row_iq3_xxs_ref

Quantize Row Reference (IQ)
quantize_row_q2_K_ref

Quantize Row Reference (K-quants)
summary.ggml_sequential_model

Summary method for ggml_sequential_model
nn_init_zeros

Initialize bias tensor to zeros
rope_types

RoPE Mode Constants