Learn R Programming

⚠️There's a newer version (0.3.1) of this package.Take me there.

torchaudio

torchaudio is an extension for torch providing audio loading, transformations, common architectures for signal processing, pre-trained weights and access to commonly used datasets. The package is a port to R of PyTorch’s TorchAudio.

torchaudio was originally developed by Athos Damiani as part of Curso-R work. Development will continue under the roof of the mlverse organization, together with torch itself, torchvision, luz, and a number of extensions building on torch.

Installation

The CRAN release can be installed with:

install.packages("torchaudio")

You can install the development version from GitHub with:

remotes::install_github("mlverse/torchaudio")

A Waveform

torchaudio also supports loading sound files in the wav and mp3 format. We call waveform the resulting raw audio signal.

library(torchaudio)

url = "https://pytorch.org/tutorials/_static/img/steam-train-whistle-daniel_simon-converted-from-mp3.wav"
filename = tempfile(fileext = ".wav")
r = httr::GET(url, httr::write_disk(filename, overwrite = TRUE))

waveform_and_sample_rate = transform_to_tensor(tuneR_loader(filename))
waveform = waveform_and_sample_rate[[1]]
sample_rate = waveform_and_sample_rate[[2]]

paste("Shape of waveform: ", paste(dim(waveform), collapse = " "))
#> [1] "Shape of waveform:  2 276858"
paste("Sample rate of waveform: ", sample_rate)
#> [1] "Sample rate of waveform:  44100"

plot(waveform[1], col = "royalblue", type = "l")
lines(waveform[2], col = "orange")

A Spectrogram

specgram <- transform_spectrogram()(waveform)

paste("Shape of spectrogram: ", paste(dim(specgram), collapse = " "))
#> [1] "Shape of spectrogram:  2 201 1385"

specgram_as_array <- as.array(specgram$log2()[1]$t())
image(specgram_as_array[,ncol(specgram_as_array):1], col = viridis::viridis(n = 257,  option = "magma"))

Datasets (go to issue)

CMUARCTIC
COMMONVOICE
GTZAN
LIBRISPEECH
LIBRITTS
LJSPEECH
SPEECHCOMMANDS
TEDLIUM
VCTK
VCTK_092
YESNO

Models (go to issue)

ConvTasNet
Wav2Letter
WaveRNN
(what else? novel structures are very welcome!)

I/O Backend

{tuneR}

Code of Conduct

Please note that the torchaudio project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('torchaudio')

Monthly Downloads

37

Version

0.2.2

License

MIT + file LICENSE

Maintainer

Sigrid Keydana

Last Published

January 23rd, 2023

Functions in torchaudio (0.2.2)

functional__generate_wave_table

Wave Table Generator (functional)

audiofile_loader

audiofile_loader

functional__combine_max

Combine Max (functional)

cmuarctic_dataset

CMU Arctic Dataset

extract_archive

Extract Archive

functional_bandreject_biquad

Band-reject Biquad Filter (functional)

functional_bandpass_biquad

Band-pass Biquad Filter (functional)

functional_amplitude_to_db

Amplitude to DB (functional)

functional_biquad

Biquad Filter (functional)

functional_allpass_biquad

All-pass Biquad Filter (functional)

functional_add_noise_shaping

Noise Shaping (functional)

functional_bass_biquad

Bass Tone-control Effect (functional)

functional_angle

Angle (functional)

functional_complex_norm

Complex Norm (functional)

functional_compute_deltas

Delta Coefficients (functional)

functional_band_biquad

Two-pole Band Filter (functional)

functional_apply_probability_distribution

Probability Distribution Apply (functional)

functional_create_fb_matrix

Frequency Bin Conversion Matrix (functional)

functional_db_to_amplitude

DB to Amplitude (functional)

functional_dcshift

DC Shift (functional)

functional_deemph_biquad

ISO 908 CD De-emphasis IIR Filter (functional)

functional_lowpass_biquad

Low-pass Biquad Filter (functional)

functional_magphase

Magnitude and Phase (functional)

functional_mask_along_axis_iid

Mask Along Axis IID (functional)

functional_mask_along_axis

Mask Along Axis (functional)

functional_contrast

Contrast Effect (functional)

functional_create_dct

DCT transformation matrix (functional)

functional_lfilter

An IIR Filter (functional)

functional_highpass_biquad

High-pass Biquad Filter (functional)

functional_equalizer_biquad

Biquad Peaking Equalizer Filter (functional)

functional_flanger

Flanger Effect (functional)

functional_mel_scale

Mel Scale (functional)

functional_mu_law_decoding

Mu Law Decoding (functional)

functional_phase_vocoder

functional_phaser

Phasing Effect (functional)

functional_detect_pitch_frequency

Detect Pitch Frequency (functional)

functional_treble_biquad

Treble Tone-control Effect (functional)

functional_dither

Dither (functional)

functional_spectrogram

Spectrogram (functional)

functional_griffinlim

Griffin-Lim Transformation (functional)

functional_overdrive

Overdrive Effect (functional)

kaldi__get_lr_indices_and_weights

Linear Resample Indices And Weights

kaldi__get_num_lr_output_samples

Linear Resample Output Samples

kaldi_resample_waveform

Kaldi's Resample Waveform

model_melresnet

functional_gain

Gain (functional)

internal__normalize_audio

Audio Normalization

functional_mu_law_encoding

Mu Law Encoding (functional)

linear_to_mel_frequency

Linear to mel frequency

mel_to_linear_frequency

Mel to linear frequency

model_stretch2d

model_upsample_network

UpsampleNetwork

torchaudio_load

Load Audio File

functional_riaa_biquad

RIAA Vinyl Playback Equalisation (functional)

functional_sliding_window_cmn

sliding-window Cepstral Mean Normalization (functional)

transform__axismasking

torchaudio_loader

Load Audio File

Audio Information

Voice Activity Detector (functional)

set_audio_backend

Set the backend for I/O operation

transform_mel_scale

transform_mel_spectrogram

Mel Spectrogram

Mel-frequency Cepstrum Coefficients

speechcommand_dataset

Speech Commands Dataset

transform_inverse_mel_scale

Inverse Mel Scale

transform_to_tensor

Convert an audio object into a tensor

MP3 Information

transform_frequencymasking

Frequency-domain Masking

Voice Activity Detector

Add a volume to an waveform.

transform_amplitude_to_db

Amplitude to DB

transform_complex_norm

List recursively all files ending with a suffix at a given root

Wave Information

transform_time_stretch

transform_timemasking

Time-domain Masking

transform_compute_deltas

Delta Coefficients

transform_sliding_window_cmn

sliding-window Cepstral Mean Normalization

transform_spectrogram

transform_mu_law_decoding

Mu Law Decoding

transform_mu_law_encoding

Mu Law Encoding

transform_resample

Signal Resample

backend_utils_list_audio_backends

List Available Audio Backends

functional__median_smoothing

Median Smoothing (functional)

functional__compute_nccf

Normalized Cross-Correlation Function (functional)

functional__find_max_per_frame

Find Max Per Frame (functional)