ggml_layer_dropout

Applies dropout regularization. During training, multiplies all activations
by <code>(1 - rate)</code> (deterministic expected-value scaling).
During inference (<code>training = FALSE</code>), the layer is an identity (no change).

Provides 'R' bindings to the 'GGML' tensor library for machine
learning, designed primarily for 'Vulkan' GPU acceleration with full CPU
fallback. 'Vulkan' support is auto-detected at build time on Linux (when
'libvulkan-dev' and 'glslc' are installed) and on Windows (when 'Vulkan'
'SDK' is installed and 'VULKAN_SDK' environment variable is set); all
operations fall back to CPU transparently when no GPU is available.
Implements tensor operations, neural network layers, quantization, and a
'Keras'-like sequential model API for building and training networks.
Includes 'AdamW' (Adam with Weight decay) and 'SGD' (Stochastic Gradient
Descent) optimizers with 'MSE' (Mean Squared Error) and cross-entropy
losses. Also provides a dynamic 'autograd' engine ('PyTorch'-style) with
data-parallel training via 'dp_train()', broadcast arithmetic, 'f16'
(half-precision) support on 'Vulkan' GPU, and a multi-head attention layer
for building Transformer architectures. Serves as backend for 'LLM' (Large
Language Model) inference via 'llamaR' and Stable Diffusion image
generation via 'sdR'. See <https://github.com/ggml-org/ggml> for more
information about the underlying library.

Yuri Baramykov

ggmlR

'GGML' Tensor Operations for Machine Learning

Georgi Gerganov

Jeffrey Quesnelle

Bowen Peng

Mozilla Foundation 

ggml_layer_dropout function

<dl><dt>model</dt>
<dd>A <code>ggml_sequential_model</code> or <code>ggml_tensor_node</code>.</dd>
<dt>rate</dt>
<dd>Dropout rate in <code>[0, 1)</code>. Fraction of units to "drop".</dd>
<dt>stochastic</dt>
<dd>Logical. If <code>TRUE</code>, use inverted dropout with a
random Bernoulli mask regenerated each epoch (proper regularization).
If <code>FALSE</code> (default), use deterministic scaling by
<code>(1 - rate)</code> — cheaper but weaker regularization.</dd>
<dt>name</dt>
<dd>Optional layer name.</dd>
<dt>trainable</dt>
<dd>Ignored for dropout (no weights); kept for API consistency.</dd></dl>

Arguments

Keras implements inverted dropout: during training it applies a random
Bernoulli mask and scales surviving activations up by
<code>1 / (1 - rate)</code>, so the expected value of each unit is preserved and
no scaling is needed at inference.
This implementation uses deterministic scaling (multiply by
<code>(1 - rate)</code> at training, identity at inference) — equivalent in
expected value but without stochastic noise. Consequences:<ul>
<li>No random mask → the regularization signal is weaker (no co-adaptation
 breaking).</li>
<li>Activations at training are scaled down, not up — the magnitude
 seen by subsequent layers differs from Keras behaviour.</li>
<li>Results are fully deterministic and reproducible without setting a seed.</li>
</ul>

Difference from Keras / inverted dropout

Add Dropout Layer — ggml_layer_dropout

<dl>

<dt>model</dt>
<dd>A <code>ggml_sequential_model</code> or <code>ggml_tensor_node</code>.</dd>


<dt>rate</dt>
<dd>Dropout rate in <code>[0, 1)</code>. Fraction of units to "drop".</dd>


<dt>stochastic</dt>
<dd>Logical. If <code>TRUE</code>, use inverted dropout with a
random Bernoulli mask regenerated each epoch (proper regularization).
If <code>FALSE</code> (default), use deterministic scaling by
<code>(1 - rate)</code> — cheaper but weaker regularization.</dd>


<dt>name</dt>
<dd>Optional layer name.</dd>


<dt>trainable</dt>
<dd>Ignored for dropout (no weights); kept for API consistency.</dd>

</dl>

Keras implements inverted dropout: during training it applies a random
Bernoulli mask and scales surviving activations up by
<code>1 / (1 - rate)</code>, so the expected value of each unit is preserved and
no scaling is needed at inference.
This implementation uses deterministic scaling (multiply by
<code>(1 - rate)</code> at training, identity at inference) — equivalent in
expected value but without stochastic noise. Consequences:<ul>
<li>No random mask → the regularization signal is weaker (no co-adaptation
 breaking).</li>
<li>Activations at training are scaled down, not up — the magnitude
 seen by subsequent layers differs from Keras behaviour.</li>
<li>Results are fully deterministic and reproducible without setting a seed.</li>
</ul>

ggml_layer_dropout: Add Dropout Layer

Description

Usage

Value

Arguments

Difference from Keras / inverted dropout

Examples