whisper_encoder: Audio Encoder
Description
Full Whisper encoder: Conv stem + positional encoding + transformer layers.
Usage
whisper_encoder(n_mels, n_ctx, n_state, n_head, n_layer)
Arguments
- n_mels
Number of mel spectrogram bins
- n_ctx
Maximum context length (1500 for 30s audio)
- n_state
Hidden dimension
- n_head
Number of attention heads
- n_layer
Number of transformer layers