
Last chance! 50% off unlimited learning
Sale ends in
Produces a modulation spectrum of waveform(s) or audio file(s). It begins
with some spectrogram-like time-frequency representation and analyzes the
modulation of the envelope in each frequency band. if specSource =
'audSpec'
, the sound is passed through a bank of bandpass filters with
audSpectrogram
. If specSource = 'STFT'
, we begin with an
ordinary spectrogram produced with a Short-Time Fourier Transform. If
msType = '2D'
, the modulation spectrum is a 2D Fourier transform of
the spectrogram-like representation, with temporal modulation along the X
axis and spectral modulation along the Y axis. A good visual analogy is
decomposing the spectrogram into a sum of ripples of various frequencies and
directions. If msType = '1D'
, the modulation spectrum is a matrix
containing 1D Fourier transforms of each frequency band in the spectrogram,
so the result again has modulation frequencies along the X axis, but the Y
axis now shows the frequency of each analyzed band. Roughness is calculated
as the proportion of the modulation spectrum within roughRange
of
temporal modulation frequencies or some weighted version thereof. The
frequency of amplitude modulation (amMsFreq, Hz) is calculated as the highest
peak in the smoothed AM function, and its purity (amMsPurity, dB) as the
ratio of this peak to the median AM over amRange
. For relatively short
and steady sounds, set amRes = NULL
and analyze the entire sound. For
longer sounds and when roughness or AM vary over time, set amRes
to
get multiple measurements over time (see examples). For multiple inputs, such
as a list of waveforms or path to a folder with audio files, the ensemble of
modulation spectra can be interpolated to the same spectral and temporal
resolution and averaged (if averageMS = TRUE
).
modulationSpectrum(
x,
samplingRate = NULL,
scale = NULL,
from = NULL,
to = NULL,
msType = c("1D", "2D")[2],
specSource = c("STFT", "audSpec")[1],
windowLength = 15,
step = 1,
wn = "hanning",
zp = 0,
audSpec_pars = list(filterType = "butterworth", nFilters = 32, bandwidth = 1/24, yScale
= "bark", dynamicRange = 120),
amRes = 5,
maxDur = 5,
specMethod = c("spec", "meanspec")[2],
logSpec = FALSE,
logMPS = FALSE,
power = 1,
normalize = TRUE,
roughRange = c(30, 150),
roughMean = NULL,
roughSD = NULL,
roughMinFreq = 1,
amRange = c(10, 200),
returnMS = TRUE,
returnComplex = FALSE,
summaryFun = c("mean", "median", "sd"),
averageMS = FALSE,
reportEvery = NULL,
cores = 1,
plot = TRUE,
savePlots = NULL,
logWarpX = NULL,
logWarpY = NULL,
quantiles = c(0.5, 0.8, 0.9),
kernelSize = 5,
kernelSD = 0.5,
colorTheme = c("bw", "seewave", "heat.colors", "...")[1],
col = NULL,
main = NULL,
xlab = "Hz",
ylab = NULL,
xlim = NULL,
ylim = NULL,
width = 900,
height = 500,
units = "px",
res = NA,
...
)
Returns a list with the following components:
$original
modulation spectrum prior to blurring and log-warping,
but after squaring if power = TRUE
, a matrix of nonnegative values.
Colnames are temporal modulation frequencies (Hz). Rownames are spectral
modulation frequencies (cycles/kHz) if msType = '2D'
and frequencies
of filters or spectrograms bands (kHz) if msType = '1D'
.
$processed
modulation spectrum after blurring and log-warping
$complex
untransformed complex modulation spectrum (returned
only if returnComplex = TRUE)
$roughness
proportion of the modulation spectrum within
roughRange
of temporal modulation frequencies or a weighted average
thereof if roughMean
and roughSD
are defined, % - a vector if
amRes is numeric and the sound is long enough, otherwise a single number
$roughness_list
a list containing frequencies, amplitudes, and
roughness values for each analyzed frequency band (1D) or frequency
modulation band (2D)
$amMsFreq
frequency of the highest peak, within amRange
, of
the folded AM function (average AM across all FM bins for both negative and
positive AM frequencies), where a peak is a local maximum over amRes
Hz. Like roughness
, amMsFreq
and amMsPurity
can be single
numbers or vectors, depending on whether the sound is analyzed as a whole or
in chunks
$amMsPurity
ratio of the peak at amMsFreq to the median AM over
amRange
, dB
$summary
dataframe with summaries of roughness, amMsFreq, and
amMsPurity
path to a folder, one or more wav or mp3 files c('file1.wav', 'file2.mp3'), Wave object, numeric vector, or a list of Wave objects or numeric vectors
sampling rate of x
(only needed if x
is a
numeric vector)
maximum possible amplitude of input used for normalization of
input vector (only needed if x
is a numeric vector)
if NULL (default), analyzes the whole sound, otherwise from...to (s)
'2D' = two-dimensional Fourier transform of a spectrogram; '1D' = separately calculated spectrum of each frequency band
'STFT' = Short-Time Fourier Transform; 'audSpec' = a bank
of bandpass filters (see audSpectrogram
)
parameters for extracting a spectrogram if
specType = 'STFT'
. Window length and step are specified in ms (see
spectrogram
). If specType = 'audSpec'
, these settings
have no effect
parameters for extracting an auditory spectrogram if
specType = 'audSpec'
. If specType = 'STFT'
, these settings
have no effect
target resolution of amplitude modulation, Hz. If NULL
,
the entire sound is analyzed at once, resulting in a single roughness value
(unless it is longer than maxDur
, in which case it is analyzed in
chunks maxDur
s long). If amRes
is set, roughness is
calculated for windows ~1000/amRes
ms long (but at least 3 STFT
frames). amRes
also affects the amount of smoothing when calculating
amMsFreq
and amMsPurity
sounds longer than maxDur
s are split into fragments,
and the modulation spectra of all fragments are averaged
the function to call when calculating the spectrum of each
frequency band (only used when msType = '1D'
); 'meanspec' is faster
and less noisy, whereas 'spec' produces higher resolution
if TRUE, the spectrogram is log-transformed prior to taking 2D FFT
if TRUE, the modulation spectrum is log-transformed prior to calculating roughness
raise modulation spectrum to this power (eg power = 2 for ^2, or "power spectrum")
if TRUE, the modulation spectrum of each analyzed fragment
maxDur
in duration is separately normalized to have max = 1
the range of temporal modulation frequencies that constitute the "roughness" zone, Hz
the mean (Hz) and standard deviation (semitones) of
a lognormal distribution used to weight roughness estimates. If either is
null, roughness is calculated simply as the proportion of spectrum within
roughRange
. If both roughMean
and roughRange
are
defined, weights outside roughRange
are set to 0; a very large SD (a
flat weighting function) gives the same result as just roughRange
without any weighting (see examples)
frequencies below roughMinFreq (Hz) are ignored when calculating roughness (ie the estimated roughness increases if we disregard very low-frequency modulation, which is often strong)
the range of temporal modulation frequencies that we are interested in as "amplitude modulation" (AM), Hz
if FALSE, only roughness is returned (much faster). Careful with exporting the modulation spectra of a lot of sounds at once as this requires a lot of RAM
if TRUE, returns a complex modulation spectrum (without normalization and warping)
functions used to summarize each acoustic characteristic, eg "c('mean', 'sd')"; user-defined functions are fine (see examples); NAs are omitted automatically for mean/median/sd/min/max/range/sum, otherwise take care of NAs yourself
if TRUE, the modulation spectra of all inputs are averaged into a single output; if FALSE, a separate MS is returned for each input
when processing multiple inputs, report estimated time left every ... iterations (NULL = default, NA = don't report)
number of cores for parallel processing
if TRUE, plots the modulation spectrum of each sound (see
plotMS
)
if a valid path is specified, a plot is saved in this folder (defaults to NA)
numeric vector of length 2: c(sigma, base) of pseudolog-warping the modulation spectrum, as in function pseudo_log_trans() from the "scales" package
labeled contour values, % (e.g., "50" marks regions that contain 50% of the sum total of the entire modulation spectrum)
the size of Gaussian kernel used for smoothing (1 = no smoothing)
the SD of Gaussian kernel used for smoothing, relative to its size
black and white ('bw'), as in seewave package ('seewave'),
matlab-type palette ('matlab'), or any palette from
palette
such as 'heat.colors', 'cm.colors', etc
actual colors, eg rev(rainbow(100)) - see ?hcl.colors for colors in base R (overrides colorTheme)
graphical parameters
parameters passed to
png
if the plot is saved
other graphical parameters passed on to filled.contour.mod
and contour
(see spectrogram
)
Singh, N. C., & Theunissen, F. E. (2003). Modulation spectra of natural sounds and ethological theories of auditory processing. The Journal of the Acoustical Society of America, 114(6), 3394-3411.
plotMS
spectrogram
audSpectrogram
analyze
# White noise
ms = modulationSpectrum(rnorm(16000), samplingRate = 16000,
logSpec = FALSE, power = TRUE,
amRes = NULL) # analyze the entire sound, giving a single roughness value
str(ms)
# Harmonic sound
s = soundgen(pitch = 440, amFreq = 100, amDep = 50)
ms = modulationSpectrum(s, samplingRate = 16000, amRes = NULL)
ms[c('roughness', 'amMsFreq', 'amMsPurity')] # a single value for each
ms1 = modulationSpectrum(s, samplingRate = 16000, amRes = 5)
ms1[c('roughness', 'amMsFreq', 'amMsPurity')]
# measured over time (low values of amRes mean more precision, so we analyze
# longer segments and get fewer values per sound)
# Embellish
ms = modulationSpectrum(s, samplingRate = 16000, logMPS = TRUE,
xlab = 'Temporal modulation, Hz', ylab = 'Spectral modulation, 1/kHz',
colorTheme = 'matlab', main = 'Modulation spectrum', lty = 3)
# 1D instead of 2D
modulationSpectrum(s, 16000, msType = '1D')
if (FALSE) {
# A long sound with varying AM and a bit of chaos at the end
s_long = soundgen(sylLen = 3500, pitch = c(250, 320, 280),
amFreq = c(30, 55), amDep = c(20, 60, 40),
jitterDep = c(0, 0, 2))
playme(s_long)
ms = modulationSpectrum(s_long, 16000)
# plot AM over time
plot(x = seq(1, 1500, length.out = length(ms$amMsFreq)), y = ms$amMsFreq,
cex = 10^(ms$amMsPurity/20) * 10, xlab = 'Time, ms', ylab = 'AM frequency, Hz')
# plot roughness over time
spectrogram(s_long, 16000, ylim = c(0, 4),
extraContour = list(ms$roughness / max(ms$roughness) * 4000, col = 'blue'))
# As with spectrograms, there is a tradeoff in time-frequency resolution
s = soundgen(pitch = 500, amFreq = 50, amDep = 100, sylLen = 500,
samplingRate = 44100, plot = TRUE)
# playme(s, samplingRate = 44100)
ms = modulationSpectrum(s, samplingRate = 44100,
windowLength = 50, step = 50, amRes = NULL) # poor temporal resolution
ms = modulationSpectrum(s, samplingRate = 44100,
windowLength = 5, step = 1, amRes = NULL) # poor frequency resolution
ms = modulationSpectrum(s, samplingRate = 44100,
windowLength = 15, step = 3, amRes = NULL) # a reasonable compromise
# Start with an auditory spectrogram instead of STFT
modulationSpectrum(s, 44100, specSource = 'audSpec', xlim = c(-100, 100))
modulationSpectrum(s, 44100, specSource = 'audSpec',
logWarpX = c(10, 2), xlim = c(-500, 500),
audSpec_pars = list(nFilters = 32, filterType = 'gammatone', bandwidth = NULL))
# customize the plot
ms = modulationSpectrum(s, samplingRate = 44100,
windowLength = 15, overlap = 80, amRes = NULL,
kernelSize = 17, # more smoothing
xlim = c(-70, 70), ylim = c(0, 4), # zoom in on the central region
quantiles = c(.25, .5, .8), # customize contour lines
col = rev(rainbow(100)), # alternative palette
logWarpX = c(10, 2), # pseudo-log transform
power = 2) # ^2
# Note the peaks at FM = 2/kHz (from "pitch = 500") and AM = 50 Hz (from
# "amFreq = 50")
# Input can be a wav/mp3 file
ms = modulationSpectrum('~/Downloads/temp/16002_Faking_It_Large_clear.wav')
# Input can be path to folder with audio files. Each file is processed
# separately, and the output can contain an MS per file...
ms1 = modulationSpectrum('~/Downloads/temp', kernelSize = 11,
plot = FALSE, averageMS = FALSE)
ms1$summary
names(ms1$original) # a separate MS per file
# ...or a single MS can be calculated:
ms2 = modulationSpectrum('~/Downloads/temp', kernelSize = 11,
plot = FALSE, averageMS = TRUE)
plotMS(ms2$original)
ms2$summary
# Input can also be a list of waveforms (numeric vectors)
ss = vector('list', 10)
for (i in 1:length(ss)) {
ss[[i]] = soundgen(sylLen = runif(1, 100, 1000), temperature = .4,
pitch = runif(3, 400, 600))
}
# lapply(ss, playme)
# MS of the first sound
ms1 = modulationSpectrum(ss[[1]], samplingRate = 16000, scale = 1)
# average MS of all 10 sounds
ms2 = modulationSpectrum(ss, samplingRate = 16000, scale = 1, averageMS = TRUE, plot = FALSE)
plotMS(ms2$original)
# A sound with ~3 syllables per second and only downsweeps in F0 contour
s = soundgen(nSyl = 8, sylLen = 200, pauseLen = 100, pitch = c(300, 200))
# playme(s)
ms = modulationSpectrum(s, samplingRate = 16000, maxDur = .5,
xlim = c(-25, 25), colorTheme = 'seewave',
power = 2)
# note the asymmetry b/c of downsweeps
# "power = 2" returns squared modulation spectrum - note that this affects
# the roughness measure!
ms$roughness
# compare:
modulationSpectrum(s, samplingRate = 16000, maxDur = .5,
xlim = c(-25, 25), colorTheme = 'seewave',
power = 1)$roughness # much higher roughness
# Plotting with or without log-warping the modulation spectrum:
ms = modulationSpectrum(soundgen(), samplingRate = 16000, plot = TRUE)
ms = modulationSpectrum(soundgen(), samplingRate = 16000,
logWarpX = c(2, 2), plot = TRUE)
# logWarp and kernelSize have no effect on roughness
# because it is calculated before these transforms:
modulationSpectrum(s, samplingRate = 16000, logWarpX = c(1, 10))$roughness
modulationSpectrum(s, samplingRate = 16000, logWarpX = NA)$roughness
modulationSpectrum(s, samplingRate = 16000, kernelSize = 17)$roughness
# Log-transform the spectrogram prior to 2D FFT (affects roughness):
modulationSpectrum(s, samplingRate = 16000, logSpec = FALSE)$roughness
modulationSpectrum(s, samplingRate = 16000, logSpec = TRUE)$roughness
# Use a lognormal weighting function to calculate roughness
# (instead of just % in roughRange)
modulationSpectrum(s, 16000, roughRange = NULL,
roughMean = 75, roughSD = 3)$roughness
modulationSpectrum(s, 16000, roughRange = NULL,
roughMean = 100, roughSD = 12)$roughness
# truncate weights outside roughRange
modulationSpectrum(s, 16000, roughRange = c(30, 150),
roughMean = 100, roughSD = 1000)$roughness # very large SD
modulationSpectrum(s, 16000, roughRange = c(30, 150),
roughMean = NULL)$roughness # same as above b/c SD --> Inf
# Complex modulation spectrum with phase preserved
ms = modulationSpectrum(soundgen(), samplingRate = 16000,
returnComplex = TRUE)
plotMS(abs(ms$complex)) # note the symmetry
# compare:
plotMS(ms$original)
}
Run the code above in your browser using DataLab