Learn R Programming

stt.api (version 0.2.1)

stt: Speech to Text

Description

Convert an audio file to text using a local whisper backend or an OpenAI-compatible API.

Usage

stt(file, model = NULL, language = NULL,
    response_format = c("json", "text", "verbose_json"),
    backend = c("auto", "whisper", "openai"), prompt = NULL)

Value

A list with components:

text

The transcribed text as a single string.

segments

A data.frame of segments with timing info, or NULL.

language

The detected or specified language code.

backend

Which backend was used ("api" or "whisper").

raw

The raw response from the backend.

Arguments

file

Path to the audio file to convert.

model

Model name to use for transcription. For API backends, this is passed directly (e.g., "whisper-1"). For whisper, this is the model size (e.g., "tiny", "base", "small", "medium", "large"). If NULL, uses the backend's default.

language

Language code (e.g., "en", "es", "fr"). Optional hint to improve transcription accuracy.

response_format

Response format for API backend. One of "text", "json", or "verbose_json". Ignored for whisper backend.

backend

Which backend to use: "auto" (default), "whisper", or "openai". Auto mode tries whisper first, then openai API (if configured).

prompt

Optional text to guide the transcription. For API backend, this is passed as initial_prompt to help with spelling of names, acronyms, or domain-specific terms. Ignored for whisper backend.

Examples

Run this code
if (FALSE) {
# Using OpenAI API
set_stt_base("https://api.openai.com")
set_stt_key(Sys.getenv("OPENAI_API_KEY"))
result <- stt("speech.wav", model = "whisper-1")
result$text

# Using local server
set_stt_base("http://localhost:4123")
result <- stt("speech.wav")
}

Run the code above in your browser using DataLab