
Last chance! 50% off unlimited learning
Sale ends in
Computes similarity between two sounds based on correlating mel-transformed
spectra (auditory spectra). Called by matchPars
.
compareSounds(target, targetSpec = NULL, cand, samplingRate = NULL,
method = c("cor", "cosine", "pixel", "dtw")[1:4], windowLength = 40,
overlap = 50, step = NULL, padWith = NA,
penalizeLengthDif = TRUE, dynamicRange = 80, maxFreq = NULL,
summary = TRUE)
the sound we want to reproduce using soundgen: path to a .wav file or numeric vector
if already calculated, the target auditory spectrum can be provided to speed things up
the sound to be compared to target
sampling rate of target
(only needed if target is
a numeric vector, rather than a .wav file)
method of comparing mel-transformed spectra of two sounds:
"cor" = average Pearson's correlation of mel-transformed spectra of
individual FFT frames; "cosine" = same as "cor" but with cosine similarity
instead of Pearson's correlation; "pixel" = absolute difference between
each point in the two spectra; "dtw" = discrete time warp with
dtw
length of FFT window, ms
overlap between successive FFT frames, %
you can override overlap
by specifying FFT step, ms
compared spectra are padded with either silence (padWith
= 0
) or with NA's (padWith = NA
) to have the same number of
columns. When the sounds are of different duration, padding with zeros
rather than NA's improves the fit to target measured by method =
'pixel'
and 'dtw'
, but it has no effect on 'cor'
and
'cosine'
.
if TRUE, sounds of different length are considered to be less similar; if FALSE, only the overlapping parts of two sounds are compared
parts of the spectra quieter than -dynamicRange
dB
are not compared
parts of the spectra above maxFreq
Hz are not compared
if TRUE, returns the mean of similarity values calculated by
all methods in method
# NOT RUN {
target = soundgen(sylLen = 500, formants = 'a',
pitch = data.frame(time = c(0, 0.1, 0.9, 1),
value = c(100, 150, 135, 100)),
temperature = 0.001)
targetSpec = soundgen:::getMelSpec(target, samplingRate = 16000)
parsToTry = list(
list(formants = 'i', # wrong
pitch = data.frame(time = c(0, 1), # wrong
value = c(200, 300))),
list(formants = 'i', # wrong
pitch = data.frame(time = c(0, 0.1, 0.9, 1), # right
value = c(100, 150, 135, 100))),
list(formants = 'a', # right
pitch = data.frame(time = c(0,1), # wrong
value = c(200, 300))),
list(formants = 'a',
pitch = data.frame(time = c(0, 0.1, 0.9, 1), # right
value = c(100, 150, 135, 100))) # right
)
sounds = list()
for (s in 1:length(parsToTry)) {
sounds[[length(sounds) + 1]] = do.call(soundgen,
c(parsToTry[[s]], list(temperature = 0.001, sylLen = 500)))
}
method = c('cor', 'cosine', 'pixel', 'dtw')
df = matrix(NA, nrow = length(parsToTry), ncol = length(method))
colnames(df) = method
df = as.data.frame(df)
for (i in 1:nrow(df)) {
df[i, ] = compareSounds(
target = NULL, # can use target instead of targetSpec...
targetSpec = targetSpec, # ...but faster to calculate targetSpec once
cand = sounds[[i]],
samplingRate = 16000,
padWith = NA,
penalizeLengthDif = TRUE,
method = method,
summary = FALSE
)
}
df$av = rowMeans(df, na.rm = TRUE)
# row 1 = wrong pitch & formants, ..., row 4 = right pitch & formants
df$formants = c('wrong', 'wrong', 'right', 'right')
df$pitch = c('wrong', 'right', 'wrong', 'right')
df
# }
Run the code above in your browser using DataLab