stt.api
stt.api is a minimal, backend-agnostic R client for OpenAI-compatible speech-to-text (STT) APIs, with optional local fallbacks.
It lets you transcribe audio in R without caring which backend actually performs the transcription.
What stt.api is (and is not)
✅ What it is
A unified interface for speech-to-text in R
A way to switch easily between:
{whisper}(native R torch, local GPU/CPU)- OpenAI
/v1/audio/transcriptions(cloud or local servers)
Designed for scripting, Shiny apps, containers, and reproducible pipelines
❌ What it is not
- Not a Whisper reimplementation
- Not a model manager
- Not a GPU / CUDA helper
- Not an audio preprocessing toolkit
- Not a replacement for
{whisper}
Installation
remotes::install_github("cornball-ai/stt.api")Required dependencies are minimal:
curljsonlite
Optional backends:
{whisper}(recommended, on CRAN)
Quick start
install.packages("whisper")
remotes::install_github("cornball-ai/stt.api")
library(stt.api)
res <- stt("speech.wav")
res$textThat's it. With {whisper} installed, stt() transcribes locally on GPU or CPU with no configuration needed.
Other backends
stt.api also supports OpenAI-compatible APIs for cloud or container-based transcription:
set_stt_base("http://localhost:4123")
# Optional, for hosted services like OpenAI
set_stt_key(Sys.getenv("OPENAI_API_KEY"))
res <- stt("speech.wav", backend = "openai")This works with OpenAI, Whisper containers, LM Studio, OpenWebUI, AnythingLLM, or any server implementing /v1/audio/transcriptions.
Automatic backend selection
When you call stt() without specifying a backend, it picks the first available:
{whisper}(native R torch, if installed)- OpenAI-compatible API (if
stt.api_baseis set) - Error with guidance
Normalized output
Regardless of backend, stt() always returns the same structure:
list(
text = "Transcribed text",
segments = NULL | data.frame(...),
language = "en",
backend = "api" | "whisper",
raw = <raw backend response>
)This makes it easy to switch backends without changing downstream code.
Health checks
stt_health()Returns:
list(
ok = TRUE,
backend = "api",
message = "OK"
)Useful for Shiny apps and deployment checks.
Backend selection
Explicit backend choice:
stt("speech.wav", backend = "openai")
stt("speech.wav", backend = "whisper")Automatic selection (default):
stt("speech.wav")Supported endpoints
stt.api targets the OpenAI-compatible STT spec:
POST /v1/audio/transcriptionsThis is intentionally chosen because it is:
- Widely adopted
- Simple
- Supported by many local and hosted services
- Easy to proxy and containerize
Configuration options
options(
stt.api_base = NULL,
stt.api_key = NULL,
stt.timeout = 60
)Setters:
set_stt_base()
set_stt_key()Error handling philosophy
- No silent failures
- Clear messages when a backend is unavailable
- Actionable instructions when configuration is missing
Example:
Error in stt():
No transcription backend available.
Install whisper or set stt.api_base.Relationship to tts.api
stt.api is designed to pair cleanly with tts.api:
| Task | Package |
|---|---|
| Speech → Text | stt.api |
| Text → Speech | tts.api |
Both share:
- Minimal dependencies
- OpenAI-compatible API focus
- Backend-agnostic design
- Optional Docker support
Why this package exists
Installing and maintaining local Whisper backends can be difficult:
- CUDA / cuBLAS issues
- Compiler toolchains
- Platform differences
stt.api lets you decouple your R code from those concerns.
Your transcription code stays the same whether the backend is:
- Local
- Containerized
- Cloud-hosted
- GPU-accelerated
- CPU-only
License
MIT