audio_text: IBM Watson Audio Transcriber

Description

Convert your audio to transcripts with optional keyword detection and profanity cleaning.

Usage

audio_text(audios, userpwd, keep_data = "true", callback = NULL, model = "en-US_BroadbandModel", continuous = FALSE, inactivity_timeout = 30, keywords = list(), keywords_threshold = NA, max_alternatives = 1, word_alternatives_threshold = NA, word_confidence = FALSE, timestamps = FALSE, profanity_filter = TRUE, smart_formatting = FALSE, content_type = "audio/wav")

Arguments

audios

Character vector (list) of paths to images or to .zip files containing upto 100 images.

userpwd

Character scalar containing username:password for the service.

keep_data

Character scalar specifying whether to share your data with Watson services for the purpose of training their models.

callback

Function that can be applied to responses to examine http status, headers, and content, to debug or to write a custom parser for content. The default callback parses content into a data.frame while dropping other response values to make the output easily passable to tidyverse packages like dplyr or ggplot2. For further details or debugging one can pass a print or a more compicated function.

model

Character scalar specifying language and bandwidth model. Alternatives are ar-AR_BroadbandModel, en-UK_BroadbandModel, en-UK_NarrowbandModel, en-US_NarrowbandModel, es-ES_BroadbandModel, es-ES_NarrowbandModel, fr-FR_BroadbandModel, ja-JP_BroadbandModel, ja-JP_NarrowbandModel, pt-BR_BroadbandModel, pt-BR_NarrowbandModel, zh-CN_BroadbandModel, zh-CN_NarrowbandModel.

continuous

Logical scalar specifying whether to return after a first end-of-speech incident (long pause) or to wait to combine results.

inactivity_timeout

Integer scalar giving the number of seconds after which the result is returned if no speech is detected.

keywords

List of keywords to be detected in the speech stream.

keywords_threshold

Double scalar from 0 to 1 specifying the lower bound on confidence to accept detected keywords in speech.

max_alternatives

Integer scalar giving the maximum number of alternative transcripts to return.

word_alternatives_threshold

Double scalar from 0 to 1 giving lower bound on confidence of possible words.

word_confidence

Logical scalar indicating whether to return confidence for each word.

timestamps

Logical scalar indicating whether to return time alignment for each word.

profanity_filter

Logical scalar indicating whether to censor profane words.

smart_formatting

Logical scalar indicating whether dates, times, numbers, etc. are to be formatted nicely in the transcript.

content_type

Character scalar showing format of the audio file. Alternatives are audio/flac, audio/l16;rate=n;channels=k (16 channel limit), audio/wav (9 channel limit), audio/ogg;codecs=opus, audio/basic (narrowband models only).

Value

List of parsed responses.