audio_to_mel

Main preprocessing function that converts audio to the mel spectrogram
format expected by Whisper.

Speech-to-text transcription using a native R 'torch' implementation
of 'OpenAI' 'Whisper' model <https://github.com/openai/whisper>. Supports
multiple model sizes from tiny (39M parameters) to large-v3 (1.5B parameters)
with integrated download from 'HuggingFace' <https://huggingface.co/> via the
'hfhub' package. Provides automatic speech recognition with optional language
detection and translation to English. Audio preprocessing, mel spectrogram
computation, and transformer-based encoder-decoder inference are all
implemented in R using the 'torch' package.

Troy Hernandez

whisper

Native R 'torch' Implementation of 'OpenAI' 'Whisper'

cornball.ai 

OpenAI 

audio_to_mel function

<dl><dt>file</dt>
<dd>Path to audio file, or numeric vector of audio samples</dd>
<dt>n_mels</dt>
<dd>Number of mel bins (80 for most models, 128 for large-v3)</dd>
<dt>device</dt>
<dd>torch device for output tensor</dd>
<dt>dtype</dt>
<dd>torch dtype for output tensor</dd></dl>

Arguments

Convert Audio to Mel Spectrogram — audio_to_mel

<dl>

<dt>file</dt>
<dd>Path to audio file, or numeric vector of audio samples</dd>


<dt>n_mels</dt>
<dd>Number of mel bins (80 for most models, 128 for large-v3)</dd>


<dt>device</dt>
<dd>torch device for output tensor</dd>


<dt>dtype</dt>
<dd>torch dtype for output tensor</dd>

</dl>

audio_to_mel: Convert Audio to Mel Spectrogram

Description

Usage

Value

Arguments

Examples