Learn R Programming

cognizer (version 0.0.1)

audio_text: IBM Watson Audio Transcriber

Description

Convert your audio to transcripts with optional keyword detection and profanity cleaning.

Usage

audio_text(audios, userpwd, keep_data = "true", callback = NULL, model = "en-US_BroadbandModel", continuous = FALSE, inactivity_timeout = 30, keywords = list(), keywords_threshold = NA, max_alternatives = 1, word_alternatives_threshold = NA, word_confidence = FALSE, timestamps = FALSE, profanity_filter = TRUE, smart_formatting = FALSE, content_type = "audio/wav")

Arguments

audios
Character vector (list) of paths to images or to .zip files containing upto 100 images.
userpwd
Character scalar containing username:password for the service.
keep_data
Character scalar specifying whether to share your data with Watson services for the purpose of training their models.
callback
Function that can be applied to responses to examine http status, headers, and content, to debug or to write a custom parser for content. The default callback parses content into a data.frame while dropping other response values to make the output easily passable to tidyverse packages like dplyr or ggplot2. For further details or debugging one can pass a print or a more compicated function.
model
Character scalar specifying language and bandwidth model. Alternatives are ar-AR_BroadbandModel, en-UK_BroadbandModel, en-UK_NarrowbandModel, en-US_NarrowbandModel, es-ES_BroadbandModel, es-ES_NarrowbandModel, fr-FR_BroadbandModel, ja-JP_BroadbandModel, ja-JP_NarrowbandModel, pt-BR_BroadbandModel, pt-BR_NarrowbandModel, zh-CN_BroadbandModel, zh-CN_NarrowbandModel.
continuous
Logical scalar specifying whether to return after a first end-of-speech incident (long pause) or to wait to combine results.
inactivity_timeout
Integer scalar giving the number of seconds after which the result is returned if no speech is detected.
keywords
List of keywords to be detected in the speech stream.
keywords_threshold
Double scalar from 0 to 1 specifying the lower bound on confidence to accept detected keywords in speech.
max_alternatives
Integer scalar giving the maximum number of alternative transcripts to return.
word_alternatives_threshold
Double scalar from 0 to 1 giving lower bound on confidence of possible words.
word_confidence
Logical scalar indicating whether to return confidence for each word.
timestamps
Logical scalar indicating whether to return time alignment for each word.
profanity_filter
Logical scalar indicating whether to censor profane words.
smart_formatting
Logical scalar indicating whether dates, times, numbers, etc. are to be formatted nicely in the transcript.
content_type
Character scalar showing format of the audio file. Alternatives are audio/flac, audio/l16;rate=n;channels=k (16 channel limit), audio/wav (9 channel limit), audio/ogg;codecs=opus, audio/basic (narrowband models only).

Value

List of parsed responses.