Learn R Programming

⚠️There's a newer version (0.3.1) of this package.Take me there.

torchaudio

torchaudio is an extension for torch providing audio loading, transformations, common architectures for signal processing, pre-trained weights and access to commonly used datasets. The package is a port to R of PyTorch’s TorchAudio.

torchaudio was originally developed by Athos Damiani as part of Curso-R work. Development will continue under the roof of the mlverse organization, together with torch itself, torchvision, luz, and a number of extensions building on torch.

Installation

The CRAN release can be installed with:

install.packages("torchaudio")

You can install the development version from GitHub with:

remotes::install_github("mlverse/torchaudio")

A Waveform

torchaudio also supports loading sound files in the wav and mp3 format. We call waveform the resulting raw audio signal.

library(torchaudio)

url = "https://pytorch.org/tutorials/_static/img/steam-train-whistle-daniel_simon-converted-from-mp3.wav"
filename = tempfile(fileext = ".wav")
r = httr::GET(url, httr::write_disk(filename, overwrite = TRUE))

waveform_and_sample_rate = transform_to_tensor(tuneR_loader(filename))
waveform = waveform_and_sample_rate[[1]]
sample_rate = waveform_and_sample_rate[[2]]

paste("Shape of waveform: ", paste(dim(waveform), collapse = " "))
#> [1] "Shape of waveform:  2 276858"
paste("Sample rate of waveform: ", sample_rate)
#> [1] "Sample rate of waveform:  44100"

plot(waveform[1], col = "royalblue", type = "l")
lines(waveform[2], col = "orange")

A Spectrogram

specgram <- transform_spectrogram()(waveform)

paste("Shape of spectrogram: ", paste(dim(specgram), collapse = " "))
#> [1] "Shape of spectrogram:  2 201 1385"

specgram_as_array <- as.array(specgram$log2()[1]$t())
image(specgram_as_array[,ncol(specgram_as_array):1], col = viridis::viridis(n = 257,  option = "magma"))

Datasets (go to issue)

  • CMUARCTIC
  • COMMONVOICE
  • GTZAN
  • LIBRISPEECH
  • LIBRITTS
  • LJSPEECH
  • SPEECHCOMMANDS
  • TEDLIUM
  • VCTK
  • VCTK_092
  • YESNO

Models (go to issue)

  • ConvTasNet
  • Wav2Letter
  • WaveRNN
  • (what else? novel structures are very welcome!)

I/O Backend

  • {tuneR}

Code of Conduct

Please note that the torchaudio project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('torchaudio')

Monthly Downloads

43

Version

0.2.2

License

MIT + file LICENSE

Maintainer

Sigrid Keydana

Last Published

January 23rd, 2023

Functions in torchaudio (0.2.2)

functional__generate_wave_table

Wave Table Generator (functional)
audiofile_loader

audiofile_loader
functional__combine_max

Combine Max (functional)
cmuarctic_dataset

CMU Arctic Dataset
extract_archive

Extract Archive
functional_bandreject_biquad

Band-reject Biquad Filter (functional)
functional_bandpass_biquad

Band-pass Biquad Filter (functional)
functional_amplitude_to_db

Amplitude to DB (functional)
functional_biquad

Biquad Filter (functional)
functional_allpass_biquad

All-pass Biquad Filter (functional)
functional_add_noise_shaping

Noise Shaping (functional)
functional_bass_biquad

Bass Tone-control Effect (functional)
functional_angle

Angle (functional)
functional_complex_norm

Complex Norm (functional)
functional_compute_deltas

Delta Coefficients (functional)
functional_band_biquad

Two-pole Band Filter (functional)
functional_apply_probability_distribution

Probability Distribution Apply (functional)
functional_create_fb_matrix

Frequency Bin Conversion Matrix (functional)
functional_db_to_amplitude

DB to Amplitude (functional)
functional_dcshift

DC Shift (functional)
functional_deemph_biquad

ISO 908 CD De-emphasis IIR Filter (functional)
functional_lowpass_biquad

Low-pass Biquad Filter (functional)
functional_magphase

Magnitude and Phase (functional)
functional_mask_along_axis_iid

Mask Along Axis IID (functional)
functional_mask_along_axis

Mask Along Axis (functional)
functional_contrast

Contrast Effect (functional)
functional_create_dct

DCT transformation matrix (functional)
functional_lfilter

An IIR Filter (functional)
functional_highpass_biquad

High-pass Biquad Filter (functional)
functional_equalizer_biquad

Biquad Peaking Equalizer Filter (functional)
functional_flanger

Flanger Effect (functional)
functional_mel_scale

Mel Scale (functional)
functional_mu_law_decoding

Mu Law Decoding (functional)
functional_phase_vocoder

Phase Vocoder
functional_phaser

Phasing Effect (functional)
functional_detect_pitch_frequency

Detect Pitch Frequency (functional)
functional_treble_biquad

Treble Tone-control Effect (functional)
functional_dither

Dither (functional)
functional_spectrogram

Spectrogram (functional)
functional_griffinlim

Griffin-Lim Transformation (functional)
functional_overdrive

Overdrive Effect (functional)
kaldi__get_lr_indices_and_weights

Linear Resample Indices And Weights
kaldi__get_num_lr_output_samples

Linear Resample Output Samples
kaldi_resample_waveform

Kaldi's Resample Waveform
model_melresnet

MelResNet
functional_gain

Gain (functional)
internal__normalize_audio

Audio Normalization
functional_mu_law_encoding

Mu Law Encoding (functional)
linear_to_mel_frequency

Linear to mel frequency
mel_to_linear_frequency

Mel to linear frequency
model_resblock

ResBlock
strip

Strip
model_stretch2d

Stretch2d
model_upsample_network

UpsampleNetwork
torchaudio_load

Load Audio File
functional_riaa_biquad

RIAA Vinyl Playback Equalisation (functional)
functional_sliding_window_cmn

sliding-window Cepstral Mean Normalization (functional)
transform__axismasking

Axis Masking
torchaudio_loader

Load Audio File
info

Audio Information
functional_vad

Voice Activity Detector (functional)
set_audio_backend

Set the backend for I/O operation
transform_mel_scale

Mel Scale
transform_mel_spectrogram

Mel Spectrogram
transform_mfcc

Mel-frequency Cepstrum Coefficients
speechcommand_dataset

Speech Commands Dataset
transform_inverse_mel_scale

Inverse Mel Scale
transform_to_tensor

Convert an audio object into a tensor
mp3_info

MP3 Information
transform_frequencymasking

Frequency-domain Masking
model_wavernn

WaveRNN
transform_vad

Voice Activity Detector
transform_vol

Add a volume to an waveform.
transform_amplitude_to_db

Amplitude to DB
transform_complex_norm

Complex Norm
tuneR_loader

tuneR_loader
walk_files

List recursively all files ending with a suffix at a given root
wav_info

Wave Information
transform_time_stretch

Time Stretch
transform_timemasking

Time-domain Masking
transform_compute_deltas

Delta Coefficients
transform_fade

Fade In/Out
transform_sliding_window_cmn

sliding-window Cepstral Mean Normalization
transform_spectrogram

Spectrogram
transform_mu_law_decoding

Mu Law Decoding
transform_mu_law_encoding

Mu Law Encoding
transform_resample

Signal Resample
yesno_dataset

YesNo Dataset
backend_utils_list_audio_backends

List Available Audio Backends
functional__median_smoothing

Median Smoothing (functional)
functional__compute_nccf

Normalized Cross-Correlation Function (functional)
av_loader

av_loader
functional__find_max_per_frame

Find Max Per Frame (functional)