Learn R Programming

whisper

Native R torch implementation of OpenAI Whisper for speech-to-text transcription.

Installation

# Install dependencies
install.packages(c("torch", "hfhub", "safetensors", "av", "jsonlite"))

# Install whisper from GitHub
remotes::install_github("cornball-ai/whisper")

Quick Start

library(whisper)

# Transcribe the bundled JFK "Ask not" speech (prompts to download model on first use)
jfk <- system.file("audio", "jfk.mp3", package = "whisper")
result <- transcribe(jfk)
result$text
#> "Ask not what your country can do for you, ask what you can do for your country."

On first use, you'll be prompted to download the model:

Download 'tiny' model (~151 MB) from HuggingFace? (Yes/no/cancel)

Model Management

# Download a model explicitly
download_whisper_model("tiny")

# List available models
list_whisper_models()
#> [1] "tiny" "base" "small" "medium" "large-v3"

# Check which models are downloaded
list_downloaded_models()

# Check if a specific model exists locally
model_exists("tiny")

Usage

# Basic transcription
result <- transcribe("audio.wav")
print(result$text)

# Specify model size
result <- transcribe("audio.wav", model = "small")

# Force CPU (useful if CUDA has issues)
result <- transcribe("audio.wav", device = "cpu")

# Non-English audio (specify language for better accuracy)
allende <- system.file("audio", "allende.mp3", package = "whisper")
result <- transcribe(allende, language = "es")

# Translate to English (quality is model-dependent; larger models work better)
result <- transcribe(allende, task = "translate", language = "es", model = "small")

Models

ModelParametersSizeEnglish WER
tiny39M151 MB~9%
base74M290 MB~7%
small244M967 MB~5%
medium769M3.0 GB~4%
large-v31550M6.2 GB~3%

Models are downloaded from HuggingFace and cached in ~/.cache/huggingface/ unless otherwise specified.

License

MIT

Copy Link

Version

Install

install.packages('whisper')

Version

0.1.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Troy Hernandez

Last Published

February 6th, 2026

Functions in whisper (0.1.0)

transcribe_chunk

Transcribe Single Chunk
pad_or_trim

Pad or Trim Audio to Fixed Length
transcribe

Whisper Transcription
parse_device

Parse Device Argument
whisper_tokenizer

Whisper BPE Tokenizer
parse_dtype

Parse Dtype Argument
whisper_attention

Whisper Encoder
tokenizer_encode

Encode Text to Token IDs
tokenizer_decode

Decode Token IDs to Text
transcribe_long

Transcribe Long Audio
load_audio

Load and Preprocess Audio
split_audio

Split Long Audio into Chunks
whisper_encoder

Audio Encoder
mel_to_hz

Convert Mel Scale to Hz
whisper_encoder_layer

Encoder Layer
whisper_dtype

Get Default Dtype
model_exists

Check if Model is Downloaded
whisper_config

Whisper Model Configurations
whisper_decoder

Text Decoder
whisper_device

Device and Dtype Management
whisper_lang_token

Get Language Token ID
whisper_model

Whisper Model
whisper_special_tokens

Special Token IDs
whisper_decoder_layer

Whisper Decoder
apply_bpe

Apply BPE Merges
audio_duration

Get Audio Duration
byte_to_token

Convert Byte to BPE Token
compute_stft

Compute STFT Magnitude
audio_to_mel

Convert Audio to Mel Spectrogram
copy_if_exists

Copy Weight if Exists
WHISPER_SAMPLE_RATE

Audio Preprocessing for Whisper
create_decoder

Create Decoder from Config
ensure_tokenizer_files

Ensure Tokenizer Files are Downloaded
download_whisper_model

Download Model from HuggingFace
get_weights_path

Get Path to Model Weights
load_mel_filterbank

Load Pre-computed Mel Filterbank
hz_to_mel

Convert Hz to Mel Scale
greedy_decode

Greedy Decoding
get_model_path

Get Model Cache Path
list_whisper_models

List Available Models
load_encoder_weights

Load Encoder Weights
extract_segments

Extract Segments with Timestamps
create_encoder

Create Encoder from Config
load_added_tokens

Load Added Tokens from HuggingFace
decode_bpe_bytes

Decode BPE Bytes Back to Text
is_timestamp_token

Check if Token is Timestamp
create_mel_filterbank_fallback

Create Mel Filterbank (Fallback)
clean_text

Clean Transcribed Text
load_whisper_model

Load Whisper Model
load_whisper_weights

Load Weights from Safetensors
download_tokenizer_files

Download Tokenizer Files from HuggingFace
decode_timestamp

Decode Timestamp Token
get_initial_tokens

Get Initial Decoder Tokens
load_decoder_weights

Load Decoder Weights
list_downloaded_models

List Downloaded Models