ggml_layer_gru

Gated Recurrent Unit recurrent layer. Implemented as an unrolled
computation graph (BPTT).

Provides 'R' bindings to the 'GGML' tensor library for machine
learning, designed primarily for 'Vulkan' GPU acceleration with full CPU
fallback. 'Vulkan' support is auto-detected at build time on Linux (when
'libvulkan-dev' and 'glslc' are installed) and on Windows (when 'Vulkan'
'SDK' is installed and 'VULKAN_SDK' environment variable is set); all
operations fall back to CPU transparently when no GPU is available.
Implements tensor operations, neural network layers, quantization, and a
'Keras'-like sequential model API for building and training networks.
Includes 'AdamW' (Adam with Weight decay) and 'SGD' (Stochastic Gradient
Descent) optimizers with 'MSE' (Mean Squared Error) and cross-entropy
losses. Also provides a dynamic 'autograd' engine ('PyTorch'-style) with
data-parallel training via 'dp_train()', broadcast arithmetic, 'f16'
(half-precision) support on 'Vulkan' GPU, and a multi-head attention layer
for building Transformer architectures. Serves as backend for 'LLM' (Large
Language Model) inference via 'llamaR' and Stable Diffusion image
generation via 'sdR'. See <https://github.com/ggml-org/ggml> for more
information about the underlying library.

Yuri Baramykov

ggmlR

'GGML' Tensor Operations for Machine Learning

Georgi Gerganov

Jeffrey Quesnelle

Bowen Peng

Mozilla Foundation 

ggml_layer_gru function

<dl><dt>model</dt>
<dd>A <code>ggml_sequential_model</code> or <code>ggml_tensor_node</code>.</dd>
<dt>units</dt>
<dd>Integer, number of hidden units.</dd>
<dt>return_sequences</dt>
<dd>Logical; return all hidden states or only the last.</dd>
<dt>activation</dt>
<dd>Activation for the candidate hidden state (<code>"tanh"</code>).</dd>
<dt>recurrent_activation</dt>
<dd>Activation for z/r gates (<code>"sigmoid"</code>).</dd>
<dt>input_shape</dt>
<dd>Input shape <code>c(seq_len, input_size)</code> -- required for the first layer only.</dd>
<dt>name</dt>
<dd>Optional layer name.</dd>
<dt>trainable</dt>
<dd>Logical.</dd></dl>

Arguments

<ul>
<li><code>W_zh</code> <code>[input_size, 2*units]</code> — input kernel for z and r
 gates.</li>
<li><code>U_zh</code> <code>[units, 2*units]</code> — recurrent kernel for z and r.</li>
<li><code>b_zh</code> <code>[2*units]</code> — bias for z and r.</li>
<li><code>W_n</code> <code>[input_size, units]</code> — input kernel for candidate.</li>
<li><code>U_n</code> <code>[units, units]</code> — recurrent kernel for candidate.</li>
<li><code>b_n</code> <code>[units]</code> — bias for candidate.</li>
</ul>

Weight layout

Add a GRU Layer — ggml_layer_gru

<dl>

<dt>model</dt>
<dd>A <code>ggml_sequential_model</code> or <code>ggml_tensor_node</code>.</dd>


<dt>units</dt>
<dd>Integer, number of hidden units.</dd>


<dt>return_sequences</dt>
<dd>Logical; return all hidden states or only the last.</dd>


<dt>activation</dt>
<dd>Activation for the candidate hidden state (<code>"tanh"</code>).</dd>


<dt>recurrent_activation</dt>
<dd>Activation for z/r gates (<code>"sigmoid"</code>).</dd>


<dt>input_shape</dt>
<dd>Input shape <code>c(seq_len, input_size)</code> -- required for the first layer only.</dd>


<dt>name</dt>
<dd>Optional layer name.</dd>


<dt>trainable</dt>
<dd>Logical.</dd>

</dl>

<ul>
<li><code>W_zh</code> <code>[input_size, 2*units]</code> — input kernel for z and r
 gates.</li>
<li><code>U_zh</code> <code>[units, 2*units]</code> — recurrent kernel for z and r.</li>
<li><code>b_zh</code> <code>[2*units]</code> — bias for z and r.</li>
<li><code>W_n</code> <code>[input_size, units]</code> — input kernel for candidate.</li>
<li><code>U_n</code> <code>[units, units]</code> — recurrent kernel for candidate.</li>
<li><code>b_n</code> <code>[units]</code> — bias for candidate.</li>
</ul>

ggml_layer_gru: Add a GRU Layer

Description

Usage

Value

Arguments

Weight layout

Examples