powered by
Pre-norm transformer encoder layer.
whisper_encoder_layer(n_state, n_head)
Hidden dimension
Number of attention heads