stt: Speech to Text

Description

Convert an audio file to text using a local whisper backend or an OpenAI-compatible API.

Usage

stt(file, model = NULL, language = NULL,
    response_format = c("json", "text", "verbose_json"),
    backend = c("auto", "whisper", "openai"), prompt = NULL)

Value

A list with components:

text: The transcribed text as a single string.
segments: A data.frame of segments with timing info, or NULL.
language: The detected or specified language code.
backend: Which backend was used ("api" or "whisper").
raw: The raw response from the backend.

Arguments

file: Path to the audio file to convert.
model: Model name to use for transcription. For API backends, this is passed directly (e.g., "whisper-1"). For whisper, this is the model size (e.g., "tiny", "base", "small", "medium", "large"). If NULL, uses the backend's default.
language: Language code (e.g., "en", "es", "fr"). Optional hint to improve transcription accuracy.
response_format: Response format for API backend. One of "text", "json", or "verbose_json". Ignored for whisper backend.
backend: Which backend to use: "auto" (default), "whisper", or "openai". Auto mode tries whisper first, then openai API (if configured).
prompt: Optional text to guide the transcription. For API backend, this is passed as initial_prompt to help with spelling of names, acronyms, or domain-specific terms. Ignored for whisper backend.

Examples

Run this code

if (FALSE) {
# Using OpenAI API
set_stt_base("https://api.openai.com")
set_stt_key(Sys.getenv("OPENAI_API_KEY"))
result <- stt("speech.wav", model = "whisper-1")
result$text

# Using local server
set_stt_base("http://localhost:4123")
result <- stt("speech.wav")
}

Run the code above in your browser using DataLab