modulationSpectrum: Modulation spectrum

Description

Produces a modulation spectrum of waveform(s) or audio file(s), with temporal modulation along the X axis (Hz) and spectral modulation (1/KHz) along the Y axis. A good visual analogy is decomposing the spectrogram into a sum of ripples of various frequencies and directions. Algorithm: prepare a spectrogram, take its logarithm (if logSpec = TRUE), center, perform a 2D Fourier transform (see also spec.fft() in the "spectral" package), take the upper half of the resulting symmetric matrix, and raise it to power = 2. The result is returned as $original. Roughness is calculated as the proportion of energy / amplitude of the modulation spectrum within roughRange of temporal modulation frequencies. By default, the modulation matrix is then smoothed with Gaussian blur (see gaussianSmooth2D) and log-warped (if logWarp is a positive number) prior to plotting. This processed modulation spectrum is returned as $processed. For multiple inputs, such as a list of waveforms or path to a folder with audio files, the ensemble of modulation spectra is interpolated to the same spectral and temporal resolution and averaged. This is different from the behavior of modulationSpectrumFolder, which produces a separate modulation spectrum per file, without averaging.

Usage

modulationSpectrum(x, samplingRate = NULL, maxDur = 5,
  logSpec = FALSE, windowLength = 25, step = NULL, overlap = 80,
  wn = "gaussian", zp = 0, power = 1, roughRange = c(30, 150),
  plot = TRUE, savePath = NA, logWarp = 2, quantiles = c(0.5, 0.8,
  0.9), kernelSize = 5, kernelSD = 0.5, colorTheme = c("bw",
  "seewave", "...")[1], xlab = "Hz", ylab = "1/KHz", main = NULL,
  width = 900, height = 500, units = "px", res = NA, ...)

Arguments

folder, path to a wav/mp3 file, a numeric vector representing a waveform, or a list of numeric vectors

samplingRate

sampling rate of x (only needed if x is a numeric vector, rather than an audio file). For a list of sounds, give either one samplingRate (the same for all) or as many values as there are input files

maxDur

maximum allowed duration of a single sound, s (longer sounds are split)

logSpec

if TRUE, the spectrogram is log-transformed prior to taking 2D FFT

windowLength

length of FFT window, ms

step

you can override overlap by specifying FFT step, ms

overlap

overlap between successive FFT frames, %

window type: gaussian, hanning, hamming, bartlett, rectangular, blackman, flattop

window length after zero padding, points

power

raise modulation spectrum to this power (eg power = 2 for ^2, or "power spectrum")

roughRange

the range of temporal modulation frequencies that constitute the "roughness" zone, Hz

plot

if TRUE, plots the modulation spectrum

savePath

if a valid path is specified, a plot is saved in this folder (defaults to NA)

logWarp

the base of log for warping the modulation spectrum (ie log2 if logWarp = 2); set to NULL or NA if you don't want to log-warp

quantiles

labeled contour values, % (e.g., "50" marks regions that contain 50% of the sum total of the entire modulation spectrum)

kernelSize

the size of Gaussian kernel used for smoothing (1 = no smoothing)

kernelSD

the SD of Gaussian kernel used for smoothing, relative to its size

colorTheme

black and white ('bw'), as in seewave package ('seewave'), or any palette from palette such as 'heat.colors', 'cm.colors', etc

xlab, ylab, main

graphical parameters

width, height, units, res

parameters passed to png if the plot is saved

...

other graphical parameters passed on to filled.contour.modif2 and contour

Value

Returns a list with three components:

$original modulation spectrum prior to blurring and log-warping, but after squaring if power = TRUE, a matrix of nonnegative values. Rownames are temporal modulation frequencies (Hz), and colnames are spectral modulation frequencies (cycles/KHz).
$processed modulation spectrum after blurring and log-warping
$roughness proportion of energy / amplitude of the modulation spectrum within roughRange of temporal modulation frequencies, %

References

Singh, N. C., & Theunissen, F. E. (2003). Modulation spectra of natural sounds and ethological theories of auditory processing. The Journal of the Acoustical Society of America, 114(6), 3394-3411.

Examples

Run this code

# NOT RUN {
# white noise
ms = modulationSpectrum(runif(16000), samplingRate = 16000,
  logSpec = FALSE, power = TRUE, logWarp = NULL)

# harmonic sound
s = soundgen()
ms = modulationSpectrum(s, samplingRate = 16000,
  logSpec = FALSE, power = TRUE, logWarp = NULL)

# embellish
ms = modulationSpectrum(s, samplingRate = 16000,
  xlab = 'Temporal modulation, Hz', ylab = 'Spectral modulation, 1/KHz',
  colorTheme = 'seewave', main = 'Modulation spectrum', lty = 3)
# }
# NOT RUN {
# Input can also be a list of waveforms (numeric vectors)
ss = vector('list', 10)
for (i in 1:length(ss)) {
  ss[[i]] = soundgen(sylLen = runif(1, 100, 1000), temperature = .4,
    pitch = runif(3, 400, 600))
}
# lapply(ss, playme)
ms = modulationSpectrum(ss[[1]], samplingRate = 16000)  # the first sound
ms = modulationSpectrum(ss, samplingRate = 16000)  # all 10 sounds

# As with spectrograms, there is a tradeoff in time-frequency resolution
s = soundgen(pitch = 500, amFreq = 50, amDep = 100, samplingRate = 44100)
# playme(s, samplingRate = 44100)
ms = modulationSpectrum(s, samplingRate = 44100,
  windowLength = 50, overlap = 0)  # poor temporal resolution
ms = modulationSpectrum(s, samplingRate = 44100,
  windowLength = 5, overlap = 80)  # poor frequency resolution
ms = modulationSpectrum(s, samplingRate = 44100,
  windowLength = 15, overlap = 80)  # a reasonable compromise

# Input can be a wav/mp3 file
ms = modulationSpectrum('~/Downloads/temp/200_ut_fear-bungee_11.wav')
ms = modulationSpectrum('~/Downloads/temp/200_ut_fear-bungee_11.wav',
  kernelSize = 17,  # more smoothing
  xlim = c(-20, 20), ylim = c(0, 4),  # zoom in on the central region
  quantiles = c(.25, .5, .8),  # customize contour lines
  colorTheme = 'heat.colors',  # alternative palette
  logWarp = NULL,              # don't log-warp the modulation spectrum
  power = 2)  # ^2
# NB: xlim/ylim currently won't work properly with logWarp on

# Input can be path to folder with audio files (average modulation spectrum)
ms = modulationSpectrum('~/Downloads/temp/', kernelSize = 11)
# NB: longer files will be split into fragments <maxDur in length

# "power = 2" returns squared modulation spectrum - note that this affects
the roughness measure!
# A sound with ~3 syllables per second and only downsweeps in F0 contour
s = soundgen(nSyl = 8, sylLen = 200, pauseLen = 100, pitch = c(300, 200))
# playme(s)
ms = modulationSpectrum(s, samplingRate = 16000, maxDur = .5,
  xlim = c(-25, 25), colorTheme = 'seewave', logWarp = NULL,
  power = 2)
# note the asymmetry b/c of downsweeps
ms$roughness
# compare:
modulationSpectrum(s, samplingRate = 16000, maxDur = .5,
  xlim = c(-25, 25), colorTheme = 'seewave', logWarp = NULL,
  power = 1)$roughness  # much higher roughness

# Plotting with or without log-warping the modulation spectrum:
ms = modulationSpectrum(soundgen(), samplingRate = 16000,
  logWarp = NA, plot = T)
ms = modulationSpectrum(soundgen(), samplingRate = 16000,
  logWarp = 2, plot = T)
ms = modulationSpectrum(soundgen(), samplingRate = 16000,
  logWarp = 4.5, plot = T)

# logWarp and kernelSize have no effect on roughness
# because it is calculated before these transforms:
modulationSpectrum(s, samplingRate = 16000, logWarp = 5)$roughness
modulationSpectrum(s, samplingRate = 16000, logWarp = NA)$roughness
modulationSpectrum(s, samplingRate = 16000, kernelSize = 17)$roughness

# Log-transform the spectrogram prior to 2D FFT (affects roughness):
ms = modulationSpectrum(soundgen(), samplingRate = 16000, logSpec = FALSE)
ms = modulationSpectrum(soundgen(), samplingRate = 16000, logSpec = TRUE)
# }

Run the code above in your browser using DataLab