Layer that normalizes its inputs

```
layer_batch_normalization(
object,
axis = -1L,
momentum = 0.99,
epsilon = 0.001,
center = TRUE,
scale = TRUE,
beta_initializer = "zeros",
gamma_initializer = "ones",
moving_mean_initializer = "zeros",
moving_variance_initializer = "ones",
beta_regularizer = NULL,
gamma_regularizer = NULL,
beta_constraint = NULL,
gamma_constraint = NULL,
synchronized = FALSE,
...
)
```

- object
Layer or model object

- axis
Integer, the axis that should be normalized (typically the features axis). For instance, after a

`Conv2D`

layer with`data_format="channels_first"`

, set`axis=1`

in`BatchNormalization`

.- momentum
Momentum for the moving average.

- epsilon
Small float added to variance to avoid dividing by zero.

- center
If

`TRUE`

, add offset of`beta`

to normalized tensor. If`FALSE`

,`beta`

is ignored.- scale
If

`TRUE`

, multiply by`gamma`

. If`FALSE`

,`gamma`

is not used. When the next layer is linear (also e.g.`nn.relu`

), this can be disabled since the scaling will be done by the next layer.- beta_initializer
Initializer for the beta weight.

- gamma_initializer
Initializer for the gamma weight.

- moving_mean_initializer
Initializer for the moving mean.

- moving_variance_initializer
Initializer for the moving variance.

- beta_regularizer
Optional regularizer for the beta weight.

- gamma_regularizer
Optional regularizer for the gamma weight.

- beta_constraint
Optional constraint for the beta weight.

- gamma_constraint
Optional constraint for the gamma weight.

- synchronized
If

`TRUE`

, synchronizes the global batch statistics (mean and variance) for the layer across all devices at each training step in a distributed training strategy. If`FALSE`

, each replica uses its own local batch statistics. Only relevant when used inside a`tf$distribute`

strategy.- ...
standard layer arguments.

Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1.

Importantly, batch normalization works differently during training and during inference.

**During training** (i.e. when using `fit()`

or when calling the layer/model
with the argument `training=TRUE`

), the layer normalizes its output using
the mean and standard deviation of the current batch of inputs. That is to
say, for each channel being normalized, the layer returns
`gamma * (batch - mean(batch)) / sqrt(var(batch) + epsilon) + beta`

, where:

`epsilon`

is small constant (configurable as part of the constructor arguments)`gamma`

is a learned scaling factor (initialized as 1), which can be disabled by passing`scale=FALSE`

to the constructor.`beta`

is a learned offset factor (initialized as 0), which can be disabled by passing`center=FALSE`

to the constructor.

**During inference** (i.e. when using `evaluate()`

or `predict()`

or when
calling the layer/model with the argument `training=FALSE`

(which is the
default), the layer normalizes its output using a moving average of the
mean and standard deviation of the batches it has seen during training. That
is to say, it returns
`gamma * (batch - self.moving_mean) / sqrt(self.moving_var+epsilon) + beta`

.

`self$moving_mean`

and `self$moving_var`

are non-trainable variables that
are updated each time the layer in called in training mode, as such:

`moving_mean = moving_mean * momentum + mean(batch) * (1 - momentum)`

`moving_var = moving_var * momentum + var(batch) * (1 - momentum)`

As such, the layer will only normalize its inputs during inference
*after having been trained on data that has similar statistics as the
inference data*.

When `synchronized=TRUE`

is set and if this layer is used within a
`tf$distribute`

strategy, there will be an `allreduce`

call
to aggregate batch statistics across all replicas at every
training step. Setting `synchronized`

has no impact when the model is
trained without specifying any distribution strategy.

Example usage:

`strategy <- tf$distribute$MirroredStrategy()`with(strategy$scope(), {
model <- keras_model_sequential()
model %>%
layer_dense(16) %>%
layer_batch_normalization(synchronized=TRUE)
})