modulationSpectrum: Modulation spectrum

Description

Produces a modulation spectrum of waveform(s) or audio file(s), with temporal modulation along the X axis (Hz) and spectral modulation (1/KHz) along the Y axis. A good visual analogy is decomposing the spectrogram into a sum of ripples of various frequencies and directions. Algorithm: prepare a spectrogram, take its logarithm (if logSpec = TRUE), center, perform a 2D Fourier transform (see also spectral::spec.fft()), take the upper half of the resulting symmetric matrix, and raise it to power. The result is returned as $original. Roughness is calculated as the proportion of energy / amplitude of the modulation spectrum within roughRange of temporal modulation frequencies. For plotting purposes, the modulation matrix is smoothed with Gaussian blur (see gaussianSmooth2D) and log-warped (if logWarp is a positive number). This processed modulation spectrum is returned as $processed. If the audio is long enough, multiple windows are analyzed, resulting in a vector of roughness values. For multiple inputs, such as a list of waveforms or path to a folder with audio files, the ensemble of modulation spectra can be interpolated to the same spectral and temporal resolution and averaged (if averageMS).

Usage

modulationSpectrum(
  x,
  samplingRate = NULL,
  scale = NULL,
  from = NULL,
  to = NULL,
  amRes = 10,
  maxDur = 5,
  logSpec = FALSE,
  windowLength = 15,
  step = NULL,
  overlap = 80,
  wn = "hanning",
  zp = 0,
  power = 1,
  roughRange = c(30, 150),
  returnMS = TRUE,
  returnComplex = FALSE,
  summaryFun = c("mean", "median", "sd"),
  averageMS = FALSE,
  reportEvery = NULL,
  plot = TRUE,
  savePlots = NULL,
  logWarp = NA,
  quantiles = c(0.5, 0.8, 0.9),
  kernelSize = 5,
  kernelSD = 0.5,
  colorTheme = c("bw", "seewave", "heat.colors", "...")[1],
  main = NULL,
  xlab = "Hz",
  ylab = "1/KHz",
  xlim = NULL,
  ylim = NULL,
  width = 900,
  height = 500,
  units = "px",
  res = NA,
  ...
)

Arguments

path to a folder, one or more wav or mp3 files c('file1.wav', 'file2.mp3'), Wave object, numeric vector, or a list of Wave objects or numeric vectors

samplingRate

sampling rate of x (only needed if x is a numeric vector)

scale

maximum possible amplitude of input used for normalization of input vector (only needed if x is a numeric vector)

from

if NULL (default), analyzes the whole sound, otherwise from...to (s)

amRes

target resolution of amplitude modulation, Hz. If NULL, the entire sound is analyzed at once (or in chunks maxDur s long), resulting in a single roughness value. Otherwise, roughness is calculated per frame, each selected to contain enough STFT windows to calculate roughness with precision given by amRes

maxDur

sounds longer than maxDur s are split into fragments, and the modulation spectra of all fragments are averaged

logSpec

if TRUE, the spectrogram is log-transformed prior to taking 2D FFT

windowLength

length of FFT window, ms

step

you can override overlap by specifying FFT step, ms (NB: because digital audio is sampled at discrete time intervals of 1/samplingRate, the actual step and thus the time stamps of STFT frames may be slightly different, eg 24.98866 instead of 25.0 ms)

overlap

overlap between successive FFT frames, %

window type accepted by ftwindow, currently gaussian, hanning, hamming, bartlett, rectangular, blackman, flattop

window length after zero padding, points

power

raise modulation spectrum to this power (eg power = 2 for ^2, or "power spectrum")

roughRange

the range of temporal modulation frequencies that constitute the "roughness" zone, Hz

returnMS

if FALSE, only roughness is returned (much faster)

returnComplex

if TRUE, returns a complex modulation spectrum (without normalization and warping)

summaryFun

functions used to summarize each acoustic characteristic, eg "c('mean', 'sd')"; user-defined functions are fine (see examples); NAs are omitted automatically for mean/median/sd/min/max/range/sum, otherwise take care of NAs yourself

averageMS

if TRUE, the modulation spectra of all inputs are averaged into a single output; if FALSE, a separate MS is returned for each input

reportEvery

when processing multiple inputs, report estimated time left every ... iterations (NULL = default, NA = don't report)

plot

if TRUE, plots the modulation spectrum of each sound

savePlots

if a valid path is specified, a plot is saved in this folder (defaults to NA)

logWarp

the base of log for warping the modulation spectrum (ie log2 if logWarp = 2); set to NULL or NA if you don't want to log-warp

quantiles

labeled contour values, % (e.g., "50" marks regions that contain 50% of the sum total of the entire modulation spectrum)

kernelSize

the size of Gaussian kernel used for smoothing (1 = no smoothing)

kernelSD

the SD of Gaussian kernel used for smoothing, relative to its size

colorTheme

black and white ('bw'), as in seewave package ('seewave'), or any palette from palette such as 'heat.colors', 'cm.colors', etc

xlab, ylab, main, xlim, ylim

graphical parameters

width, height, units, res

parameters passed to png if the plot is saved

...

other graphical parameters passed on to filled.contour.mod and contour (see spectrogram)

Value

Returns a list with four components:

$original modulation spectrum prior to blurring and log-warping, but after squaring if power = TRUE, a matrix of nonnegative values. Rownames are spectral modulation frequencies (cycles/KHz), and colnames are temporal modulation frequencies (Hz).
$processed modulation spectrum after blurring and log-warping
$roughness proportion of energy / amplitude of the modulation spectrum within roughRange of temporal modulation frequencies, % - a vector if amRes is numeric and the sound is long enough, a single number otherwise
$complex untransformed complex modulation spectrum (returned only if returnComplex = TRUE)

References

Singh, N. C., & Theunissen, F. E. (2003). Modulation spectra of natural sounds and ethological theories of auditory processing. The Journal of the Acoustical Society of America, 114(6), 3394-3411.

Examples

Run this code

# NOT RUN {
# White noise
ms = modulationSpectrum(runif(16000), samplingRate = 16000,
  logSpec = FALSE, power = TRUE,
  amRes = NULL)  # analyze the entire sound, giving a single roughness value
str(ms)

# Harmonic sound
s = soundgen()
ms = modulationSpectrum(s, samplingRate = 16000, amRes = NULL)
ms$roughness  # a single value
ms1 = modulationSpectrum(s, samplingRate = 16000, amRes = 10)
ms1$roughness
# roughness over time (low values of amRes mean more precision, so shorter
# segments analyzed and fewer roughness values per sound).

# Embellish
ms = modulationSpectrum(s, samplingRate = 16000,
  xlab = 'Temporal modulation, Hz', ylab = 'Spectral modulation, 1/KHz',
  colorTheme = 'heat.colors', main = 'Modulation spectrum', lty = 3)

# }
# NOT RUN {
# Roughness contour of a long sound (starts tonal, then gets rough)
s_long = soundgen(sylLen = 4000, pitch = 400,
  amDep = c(0, 0, 50), jitterDep = c(0, 0, 2))
rough = modulationSpectrum(s_long, 16000, plot = FALSE)$roughness
spectrogram(s_long, 16000, ylim = c(0, 4),
            extraContour = list(rough / max(rough) * 4000, col = 'blue'))

# Input can also be a list of waveforms (numeric vectors)
ss = vector('list', 10)
for (i in 1:length(ss)) {
  ss[[i]] = soundgen(sylLen = runif(1, 100, 1000), temperature = .4,
    pitch = runif(3, 400, 600))
}
# lapply(ss, playme)
# MS of the first sound
ms1 = modulationSpectrum(ss[[1]], samplingRate = 16000, scale = 1)
# average MS of all 10 sounds
ms2 = modulationSpectrum(ss, samplingRate = 16000, scale = 1, averageMS = TRUE)

# As with spectrograms, there is a tradeoff in time-frequency resolution
s = soundgen(pitch = 500, amFreq = 50, amDep = 100, samplingRate = 44100)
# playme(s, samplingRate = 44100)
ms = modulationSpectrum(s, samplingRate = 44100,
  windowLength = 50, step = 50, amRes = NULL)  # poor temporal resolution
ms = modulationSpectrum(s, samplingRate = 44100,
  windowLength = 5, step = 1, amRes = NULL)  # poor frequency resolution
ms = modulationSpectrum(s, samplingRate = 44100,
  windowLength = 15, step = 3, amRes = NULL)  # a reasonable compromise

# customize the plot
ms = modulationSpectrum(s, samplingRate = 44100,
  windowLength = 15, overlap = 80, amRes = NULL,
  kernelSize = 17,  # more smoothing
  xlim = c(-70, 70), ylim = c(0, 4),  # zoom in on the central region
  quantiles = c(.25, .5, .8),  # customize contour lines
  colorTheme = 'heat.colors',  # alternative palette
  power = 2)                   # ^2
# Note the peaks at FM = 2/KHz (from "pitch = 500") and AM = 50 Hz (from
# "amFreq = 50")

# Input can be a wav/mp3 file
ms = modulationSpectrum('~/Downloads/temp/200_ut_fear-bungee_11.wav')

# Input can be path to folder with audio files. Each file is processed
# separately, and the output can contain an MS per file...
ms1 = modulationSpectrum('~/Downloads/temp', kernelSize = 11,
                         plot = FALSE, averageMS = FALSE)
ms1$summary
names(ms1$original)  # a separate MS per file
# ...or a single MS can be calculated:
ms2 = modulationSpectrum('~/Downloads/temp', kernelSize = 11,
                         plot = FALSE, averageMS = TRUE)
str(ms2$original)
ms2$summary

# A sound with ~3 syllables per second and only downsweeps in F0 contour
s = soundgen(nSyl = 8, sylLen = 200, pauseLen = 100, pitch = c(300, 200))
# playme(s)
ms = modulationSpectrum(s, samplingRate = 16000, maxDur = .5,
  xlim = c(-25, 25), colorTheme = 'seewave',
  power = 2)
# note the asymmetry b/c of downsweeps

# "power = 2" returns squared modulation spectrum - note that this affects
# the roughness measure!
ms$roughness
# compare:
modulationSpectrum(s, samplingRate = 16000, maxDur = .5,
  xlim = c(-25, 25), colorTheme = 'seewave', logWarp = NULL,
  power = 1)$roughness  # much higher roughness

# Plotting with or without log-warping the modulation spectrum:
ms = modulationSpectrum(soundgen(), samplingRate = 16000,
  logWarp = NA, plot = TRUE)
ms = modulationSpectrum(soundgen(), samplingRate = 16000,
  logWarp = 2, plot = TRUE)

# logWarp and kernelSize have no effect on roughness
# because it is calculated before these transforms:
modulationSpectrum(s, samplingRate = 16000, logWarp = 5)$roughness
modulationSpectrum(s, samplingRate = 16000, logWarp = NA)$roughness
modulationSpectrum(s, samplingRate = 16000, kernelSize = 17)$roughness

# Log-transform the spectrogram prior to 2D FFT (affects roughness):
ms = modulationSpectrum(soundgen(), samplingRate = 16000, logSpec = FALSE)
ms = modulationSpectrum(soundgen(), samplingRate = 16000, logSpec = TRUE)

# Complex modulation spectrum with phase preserved
ms = modulationSpectrum(soundgen(), samplingRate = 16000,
                        returnComplex = TRUE)
image(t(log(abs(ms$complex))))
# }