Convert your audio to transcripts with optional keyword detection and profanity cleaning.
audio_text(audios, userpwd, keep_data = "true", callback = NULL,
model = "en-US_BroadbandModel", continuous = FALSE,
inactivity_timeout = 30, keywords = list(), keywords_threshold = NA,
max_alternatives = 1, word_alternatives_threshold = NA,
word_confidence = FALSE, timestamps = FALSE, profanity_filter = TRUE,
smart_formatting = FALSE, content_type = "audio/wav")
Character vector (list) of paths to images or to .zip files containing upto 100 images.
Character scalar containing username:password for the service.
Character scalar specifying whether to share your data with Watson services for the purpose of training their models.
Function that can be applied to responses to examine http status, headers, and content, to debug or to write a custom parser for content. The default callback parses content into a data.frame while dropping other response values to make the output easily passable to tidyverse packages like dplyr or ggplot2. For further details or debugging one can pass a print or a more compicated function.
Character scalar specifying language and bandwidth model. Alternatives are ar-AR_BroadbandModel, en-UK_BroadbandModel, en-UK_NarrowbandModel, en-US_NarrowbandModel, es-ES_BroadbandModel, es-ES_NarrowbandModel, fr-FR_BroadbandModel, ja-JP_BroadbandModel, ja-JP_NarrowbandModel, pt-BR_BroadbandModel, pt-BR_NarrowbandModel, zh-CN_BroadbandModel, zh-CN_NarrowbandModel.
Logical scalar specifying whether to return after a first end-of-speech incident (long pause) or to wait to combine results.
Integer scalar giving the number of seconds after which the result is returned if no speech is detected.
List of keywords to be detected in the speech stream.
Double scalar from 0 to 1 specifying the lower bound on confidence to accept detected keywords in speech.
Integer scalar giving the maximum number of alternative transcripts to return.
Double scalar from 0 to 1 giving lower bound on confidence of possible words.
Logical scalar indicating whether to return confidence for each word.
Logical scalar indicating whether to return time alignment for each word.
Logical scalar indicating whether to censor profane words.
Logical scalar indicating whether dates, times, numbers, etc. are to be formatted nicely in the transcript.
Character scalar showing format of the audio file. Alternatives are audio/flac, audio/l16;rate=n;channels=k (16 channel limit), audio/wav (9 channel limit), audio/ogg;codecs=opus, audio/basic (narrowband models only).
List of parsed responses.