ggml_layer_lstm

Long Short-Term Memory recurrent layer. Implemented as an unrolled
computation graph (BPTT) so that ggml's automatic differentiation works
without any C extensions.

Provides 'R' bindings to the 'GGML' tensor library for machine
learning, designed primarily for 'Vulkan' GPU acceleration with full CPU
fallback. 'Vulkan' support is auto-detected at build time on Linux (when
'libvulkan-dev' and 'glslc' are installed) and on Windows (when 'Vulkan'
'SDK' is installed and 'VULKAN_SDK' environment variable is set); all
operations fall back to CPU transparently when no GPU is available.
Implements tensor operations, neural network layers, quantization, and a
'Keras'-like sequential model API for building and training networks.
Includes 'AdamW' (Adam with Weight decay) and 'SGD' (Stochastic Gradient
Descent) optimizers with 'MSE' (Mean Squared Error) and cross-entropy
losses. Also provides a dynamic 'autograd' engine ('PyTorch'-style) with
data-parallel training via 'dp_train()', broadcast arithmetic, 'f16'
(half-precision) support on 'Vulkan' GPU, and a multi-head attention layer
for building Transformer architectures. Serves as backend for 'LLM' (Large
Language Model) inference via 'llamaR' and Stable Diffusion image
generation via 'sdR'. See <https://github.com/ggml-org/ggml> for more
information about the underlying library.

Yuri Baramykov

ggmlR

'GGML' Tensor Operations for Machine Learning

Georgi Gerganov

Jeffrey Quesnelle

Bowen Peng

Mozilla Foundation 

ggml_layer_lstm function

<dl><dt>model</dt>
<dd>A <code>ggml_sequential_model</code> or <code>ggml_tensor_node</code>.</dd>
<dt>units</dt>
<dd>Integer, number of hidden units.</dd>
<dt>return_sequences</dt>
<dd>Logical; if <code>TRUE</code> return all hidden states,
otherwise return only the last hidden state.</dd>
<dt>activation</dt>
<dd>Activation for the cell gate (default <code>"tanh"</code>).</dd>
<dt>recurrent_activation</dt>
<dd>Activation for the recurrent step (default
<code>"sigmoid"</code>).</dd>
<dt>input_shape</dt>
<dd>Input shape <code>c(seq_len, input_size)</code> -- required for the first layer only.</dd>
<dt>name</dt>
<dd>Optional layer name.</dd>
<dt>trainable</dt>
<dd>Logical.</dd></dl>

Arguments

<ul>
<li><code>W_gates</code> <code>[input_size, 4*units]</code> — input kernel for all
 four gates (i, f, g, o) concatenated.</li>
<li><code>U_gates</code> <code>[units, 4*units]</code> — recurrent kernel.</li>
<li><code>b_gates</code> <code>[4*units]</code> — bias.</li>
</ul>

Weight layout

Input: <code>[seq_len, input_size]</code> per sample (R row-major), or a 3-D
array <code>[N, seq_len, input_size]</code>. In the Functional API the input
node shape should be <code>c(seq_len, input_size)</code>.
Output (Sequential): <code>[units]</code> per sample when
<code>return_sequences = FALSE</code> (default), or <code>c(seq_len, units)</code>
when <code>return_sequences = TRUE</code>.

Input / output shapes

Add an LSTM Layer — ggml_layer_lstm

<dl>

<dt>model</dt>
<dd>A <code>ggml_sequential_model</code> or <code>ggml_tensor_node</code>.</dd>


<dt>units</dt>
<dd>Integer, number of hidden units.</dd>


<dt>return_sequences</dt>
<dd>Logical; if <code>TRUE</code> return all hidden states,
otherwise return only the last hidden state.</dd>


<dt>activation</dt>
<dd>Activation for the cell gate (default <code>"tanh"</code>).</dd>


<dt>recurrent_activation</dt>
<dd>Activation for the recurrent step (default
<code>"sigmoid"</code>).</dd>


<dt>input_shape</dt>
<dd>Input shape <code>c(seq_len, input_size)</code> -- required for the first layer only.</dd>


<dt>name</dt>
<dd>Optional layer name.</dd>


<dt>trainable</dt>
<dd>Logical.</dd>

</dl>

<ul>
<li><code>W_gates</code> <code>[input_size, 4*units]</code> — input kernel for all
 four gates (i, f, g, o) concatenated.</li>
<li><code>U_gates</code> <code>[units, 4*units]</code> — recurrent kernel.</li>
<li><code>b_gates</code> <code>[4*units]</code> — bias.</li>
</ul>

Input: <code>[seq_len, input_size]</code> per sample (R row-major), or a 3-D
array <code>[N, seq_len, input_size]</code>. In the Functional API the input
node shape should be <code>c(seq_len, input_size)</code>.
Output (Sequential): <code>[units]</code> per sample when
<code>return_sequences = FALSE</code> (default), or <code>c(seq_len, units)</code>
when <code>return_sequences = TRUE</code>.

ggml_layer_lstm: Add an LSTM Layer

Description

Usage

Value

Arguments

Weight layout

Input / output shapes

Examples