matchPars: Match soundgen pars (experimental)

Description

Attempts to find settings for soundgen that will reproduce an existing sound. The principle is to mutate control parameters, trying to improve fit to target. The currently implemented optimization algorithm is simple hill climbing. Disclaimer: this function is experimental and may or may not work for particular tasks. It is intended as a supplement to - not replacement of - manual optimization. See vignette('sound_generation', package = 'soundgen') for more information.

Usage

matchPars(target, samplingRate = NULL, pars = NULL, init = NULL,
  method = c("cor", "cosine", "pixel", "dtw"), probMutation = 0.25,
  stepVariance = 0.1, maxIter = 50, minExpectedDelta = 0.001,
  windowLength = 40, overlap = 50, step = NULL, verbose = TRUE,
  padWith = NA, penalizeLengthDif = TRUE, dynamicRange = 80,
  maxFreq = NULL)

Arguments

target

the sound we want to reproduce using soundgen: path to a .wav file or numeric vector

samplingRate

sampling rate of target (only needed if target is a numeric vector, rather than a .wav file)

pars

arguments to soundgen that we are attempting to optimize

init

a list of initial values for the optimized parameters pars and the values of other arguments to soundgen that are fixed at non-default values (if any)

method

method of comparing mel-transformed spectra of two sounds: "cor" = average Pearson's correlation of mel-transformed spectra of individual FFT frames; "cosine" = same as "cor" but with cosine similarity instead of Pearson's correlation; "pixel" = absolute difference between each point in the two spectra; "dtw" = discrete time warp with dtw

probMutation

the probability of a parameter mutating per iteration

stepVariance

scale factor for calculating the size of mutations

maxIter

maximum number of mutated sounds produced without improving the fit to target

minExpectedDelta

minimum improvement in fit to target required to accept the new sound candidate

windowLength

length of FFT window, ms

overlap

overlap between successive FFT frames, %

step

you can override overlap by specifying FFT step, ms

verbose

if TRUE, plays back the accepted candidate at each iteration and reports the outcome

padWith

compared spectra are padded with either silence (padWith = 0) or with NA's (padWith = NA) to have the same number of columns. When the sounds are of different duration, padding with zeros rather than NA's improves the fit to target measured by method = 'pixel' and 'dtw', but it has no effect on 'cor' and 'cosine'.

penalizeLengthDif

if TRUE, sounds of different length are considered to be less similar; if FALSE, only the overlapping parts of two sounds are compared

dynamicRange

parts of the spectra quieter than -dynamicRange dB are not compared

maxFreq

parts of the spectra above maxFreq Hz are not compared

Value

Returns a list of length 2: $history contains the tried parameter values together with their fit to target ($history$sim), and $pars contains a list of the final - hopefully the best - parameter settings.

Examples

Run this code

# NOT RUN {
playback = c(TRUE, FALSE)[2]  # set to TRUE to play back the audio from examples

target = soundgen(repeatBout = 3, sylLen = 120, pauseLen = 70,
  pitch = c(300, 200), rolloff = -5, play = playback)
# we hope to reproduce this sound

# }
# NOT RUN {
# Match pars based on acoustic analysis alone, without any optimization.
# This *MAY* match temporal structure, pitch, and stationary formants
m1 = matchPars(target = target,
               samplingRate = 16000,
               maxIter = 0,  # no optimization, only acoustic analysis
               verbose = playback)
cand1 = do.call(soundgen, c(m1$pars, list(play = playback, temperature = 0.001)))

# Try to improve the match by optimizing rolloff
# (this may take a few minutes to run, and the results may vary)
m2 = matchPars(target = target,
               samplingRate = 16000,
               pars = 'rolloff',
               maxIter = 100,
               verbose = playback)
# rolloff should be moving from default (-9) to target (-5):
sapply(m2$history, function(x) x$pars$rolloff)
cand2 = do.call(soundgen, c(m2$pars, list(play = playback, temperature = 0.001)))
# }

Run the code above in your browser using DataLab