compareSounds: Compare sounds (experimental)

Description

Computes similarity between two sounds based on correlating mel-transformed spectra (auditory spectra). Called by matchPars.

Usage

compareSounds(target, targetSpec = NULL, cand, samplingRate = NULL,
  method = c("cor", "cosine", "pixel", "dtw")[1:4], windowLength = 40,
  overlap = 50, step = NULL, padWith = NA,
  penalizeLengthDif = TRUE, dynamicRange = 80, maxFreq = NULL,
  summary = TRUE)

Arguments

target

the sound we want to reproduce using soundgen: path to a .wav file or numeric vector

targetSpec

if already calculated, the target auditory spectrum can be provided to speed things up

cand

the sound to be compared to target

samplingRate

sampling rate of target (only needed if target is a numeric vector, rather than a .wav file)

method

method of comparing mel-transformed spectra of two sounds: "cor" = average Pearson's correlation of mel-transformed spectra of individual FFT frames; "cosine" = same as "cor" but with cosine similarity instead of Pearson's correlation; "pixel" = absolute difference between each point in the two spectra; "dtw" = discrete time warp with dtw

windowLength

length of FFT window, ms

overlap

overlap between successive FFT frames, %

step

you can override overlap by specifying FFT step, ms

padWith

compared spectra are padded with either silence (padWith = 0) or with NA's (padWith = NA) to have the same number of columns. When the sounds are of different duration, padding with zeros rather than NA's improves the fit to target measured by method = 'pixel' and 'dtw', but it has no effect on 'cor' and 'cosine'.

penalizeLengthDif

if TRUE, sounds of different length are considered to be less similar; if FALSE, only the overlapping parts of two sounds are compared

dynamicRange

parts of the spectra quieter than -dynamicRange dB are not compared

maxFreq

parts of the spectra above maxFreq Hz are not compared

summary

if TRUE, returns the mean of similarity values calculated by all methods in method

Examples

Run this code

# NOT RUN {
target = soundgen(sylLen = 500, formants = 'a',
                  pitch = data.frame(time = c(0, 0.1, 0.9, 1),
                                     value = c(100, 150, 135, 100)),
                  temperature = 0.001)
targetSpec = soundgen:::getMelSpec(target, samplingRate = 16000)
parsToTry = list(
  list(formants = 'i',                                            # wrong
       pitch = data.frame(time = c(0, 1),                         # wrong
                          value = c(200, 300))),
  list(formants = 'i',                                            # wrong
       pitch = data.frame(time = c(0, 0.1, 0.9, 1),               # right
                                 value = c(100, 150, 135, 100))),
  list(formants = 'a',                                            # right
       pitch = data.frame(time = c(0,1),                          # wrong
                                 value = c(200, 300))),
  list(formants = 'a',
       pitch = data.frame(time = c(0, 0.1, 0.9, 1),               # right
                                 value = c(100, 150, 135, 100)))  # right
)

sounds = list()
for (s in 1:length(parsToTry)) {
  sounds[[length(sounds) + 1]] =  do.call(soundgen,
    c(parsToTry[[s]], list(temperature = 0.001, sylLen = 500)))
}

method = c('cor', 'cosine', 'pixel', 'dtw')
df = matrix(NA, nrow = length(parsToTry), ncol = length(method))
colnames(df) = method
df = as.data.frame(df)
for (i in 1:nrow(df)) {
  df[i, ] = compareSounds(
    target = NULL,            # can use target instead of targetSpec...
    targetSpec = targetSpec,  # ...but faster to calculate targetSpec once
    cand = sounds[[i]],
    samplingRate = 16000,
    padWith = NA,
    penalizeLengthDif = TRUE,
    method = method,
    summary = FALSE
  )
}
df$av = rowMeans(df, na.rm = TRUE)
# row 1 = wrong pitch & formants, ..., row 4 = right pitch & formants
df$formants = c('wrong', 'wrong', 'right', 'right')
df$pitch = c('wrong', 'right', 'wrong', 'right')
df
# }

Run the code above in your browser using DataLab