Learn R Programming

torchaudio

torchaudio is an extension for torch providing audio loading, transformations, common architectures for signal processing, pre-trained weights and access to commonly used datasets. The package is a port to R of PyTorch’s TorchAudio.

torchaudio was originally developed by Athos Damiani as part of Curso-R work. Development will continue under the roof of the mlverse organization, together with torch itself, torchvision, luz, and a number of extensions building on torch.

Installation

The CRAN release can be installed with:

install.packages("torchaudio")

You can install the development version from GitHub with:

remotes::install_github("mlverse/torchaudio")

A basic workflow

torchaudio supports a variety of workflows – such as training a neural network on a speech dataset, say – but to get started, let’s do something more basic: load a sound file, extract some information about it, convert it to something torchaudio can work with (a tensor), and display a spectrogram.

Here is an example sound:

library(torchaudio)
url <- "https://pytorch.org/tutorials/_static/img/steam-train-whistle-daniel_simon-converted-from-mp3.wav"
soundfile <- tempfile(fileext = ".wav")
r <- httr::GET(url, httr::write_disk(soundfile, overwrite = TRUE))

Using torchaudio_info(), we obtain number of channels, number of samples, and the sampling rate:

info <- torchaudio_info(soundfile)
cat("Number of channels: ", info$num_channels, "\n")
#> Number of channels:  2
cat("Number of samples: ", info$num_frames, "\n")
#> Number of samples:  276858
cat("Sampling rate: ", info$sample_rate, "\n")
#> Sampling rate:  44100

To read in the file, we call torchaudio_load(). torchaudio_load() itself delegates to the default (alternatively, the user-requested) backend to read in the file.

The default backend is av, a fast and light-weight wrapper for Ffmpeg. As of this writing, an alternative is tuneR; it may be requested via the option torchaudio.loader. (Note though that with tuneR, only wav and mp3 file extensions are supported.)

wav <- torchaudio_load(soundfile)
dim(wav)
#> [1]      2 276858

For torchaudio to be able to process the sound object, we need to convert it to a tensor. This is achieved by means of a call to transform_to_tensor(), resulting in a list of two tensors: one containing the actual amplitude values, the other, the sampling rate.

waveform_and_sample_rate <- transform_to_tensor(wav)
waveform <- waveform_and_sample_rate[[1]]
sample_rate <- waveform_and_sample_rate[[2]]

paste("Shape of waveform: ", paste(dim(waveform), collapse = " "))
#> [1] "Shape of waveform:  2 276858"
paste("Sample rate of waveform: ", sample_rate)
#> [1] "Sample rate of waveform:  44100"

plot(waveform[1], col = "royalblue", type = "l")
lines(waveform[2], col = "orange")

Finally, let’s create a spectrogam!

specgram <- transform_spectrogram()(waveform)

paste("Shape of spectrogram: ", paste(dim(specgram), collapse = " "))
#> [1] "Shape of spectrogram:  2 201 1385"

specgram_as_array <- as.array(specgram$log2()[1]$t())
image(specgram_as_array[,ncol(specgram_as_array):1], col = viridis::viridis(n = 257,  option = "magma"))

Development status

Datasets (go to issue)

  • CMUARCTIC
  • COMMONVOICE
  • GTZAN
  • LIBRISPEECH
  • LIBRITTS
  • LJSPEECH
  • SPEECHCOMMANDS
  • TEDLIUM
  • VCTK
  • VCTK_092
  • YESNO

Models (go to issue)

  • ConvTasNet
  • Wav2Letter
  • WaveRNN

I/O Backends

  • {av} (default)
  • {tuneR}

Code of Conduct

Please note that the torchaudio project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('torchaudio')

Monthly Downloads

43

Version

0.3.1

License

MIT + file LICENSE

Maintainer

Sigrid Keydana

Last Published

February 8th, 2023

Functions in torchaudio (0.3.1)

functional__median_smoothing

Median Smoothing (functional)
functional_apply_probability_distribution

Probability Distribution Apply (functional)
functional_deemph_biquad

ISO 908 CD De-emphasis IIR Filter (functional)
av_loader

av_loader
functional_dcshift

DC Shift (functional)
functional_bass_biquad

Bass Tone-control Effect (functional)
cmuarctic_dataset

CMU Arctic Dataset
functional_bandpass_biquad

Band-pass Biquad Filter (functional)
functional_bandreject_biquad

Band-reject Biquad Filter (functional)
functional_biquad

Biquad Filter (functional)
functional_complex_norm

Complex Norm (functional)
functional_create_fb_matrix

Frequency Bin Conversion Matrix (functional)
functional_mel_scale

Mel Scale (functional)
model_resblock

ResBlock
functional_mu_law_decoding

Mu Law Decoding (functional)
model_melresnet

MelResNet
functional_band_biquad

Two-pole Band Filter (functional)
functional_compute_deltas

Delta Coefficients (functional)
functional_contrast

Contrast Effect (functional)
functional_detect_pitch_frequency

Detect Pitch Frequency (functional)
functional_highpass_biquad

High-pass Biquad Filter (functional)
functional_amplitude_to_db

Amplitude to DB (functional)
functional_gain

Gain (functional)
functional_db_to_amplitude

DB to Amplitude (functional)
functional_griffinlim

Griffin-Lim Transformation (functional)
transform_sliding_window_cmn

sliding-window Cepstral Mean Normalization
functional_create_dct

DCT transformation matrix (functional)
functional_lowpass_biquad

Low-pass Biquad Filter (functional)
transform_spectrogram

Spectrogram
functional_dither

Dither (functional)
functional_angle

Angle (functional)
functional_magphase

Magnitude and Phase (functional)
functional_sliding_window_cmn

sliding-window Cepstral Mean Normalization (functional)
functional_riaa_biquad

RIAA Vinyl Playback Equalisation (functional)
functional_equalizer_biquad

Biquad Peaking Equalizer Filter (functional)
transform_fade

Fade In/Out
transform_compute_deltas

Delta Coefficients
transform_vol

Add a volume to an waveform.
functional_mu_law_encoding

Mu Law Encoding (functional)
functional_flanger

Flanger Effect (functional)
functional_overdrive

Overdrive Effect (functional)
functional_lfilter

An IIR Filter (functional)
functional_spectrogram

Spectrogram (functional)
functional_treble_biquad

Treble Tone-control Effect (functional)
list_audio_backends

List available audio backends
tuneR_loader

tuneR_loader
transform_mel_scale

Mel Scale
mel_to_linear_frequency

Mel to linear frequency
model_wavernn

WaveRNN
functional_mask_along_axis

Mask Along Axis (functional)
transform_mel_spectrogram

Mel Spectrogram
functional_mask_along_axis_iid

Mask Along Axis IID (functional)
functional_phase_vocoder

Phase Vocoder
kaldi__get_lr_indices_and_weights

Linear Resample Indices And Weights
internal__normalize_audio

Audio Normalization
functional_vad

Voice Activity Detector (functional)
functional_phaser

Phasing Effect (functional)
kaldi__get_num_lr_output_samples

Linear Resample Output Samples
kaldi_resample_waveform

Kaldi's Resample Waveform
linear_to_mel_frequency

Linear to mel frequency
strip

Strip
speechcommand_dataset

Speech Commands Dataset
model_stretch2d

Stretch2d
walk_files

List recursively all files ending with a suffix at a given root
model_upsample_network

UpsampleNetwork
torchaudio_load

Load Audio File
transform_mu_law_encoding

Mu Law Encoding
torchaudio_info

Audio Information
yesno_dataset

YesNo Dataset
transform_resample

Signal Resample
transform_amplitude_to_db

Amplitude to DB
transform_time_stretch

Time Stretch
transform_timemasking

Time-domain Masking
transform_complex_norm

Complex Norm
transform__axismasking

Axis Masking
transform_frequencymasking

Frequency-domain Masking
transform_inverse_mel_scale

Inverse Mel Scale
transform_mfcc

Mel-frequency Cepstrum Coefficients
transform_mu_law_decoding

Mu Law Decoding
transform_to_tensor

Convert an audio object into a tensor
transform_vad

Voice Activity Detector
functional__compute_nccf

Normalized Cross-Correlation Function (functional)
extract_archive

Extract Archive
functional__combine_max

Combine Max (functional)
functional__find_max_per_frame

Find Max Per Frame (functional)
functional__generate_wave_table

Wave Table Generator (functional)
functional_add_noise_shaping

Noise Shaping (functional)
functional_allpass_biquad

All-pass Biquad Filter (functional)