Acoustic analysis of a single sound file: pitch tracking, basic spectral
characteristics, and estimated loudness (see getLoudness
). The
default values of arguments are optimized for human non-linguistic
vocalizations. See vignette('acoustic_analysis', package = 'soundgen') for
details. The defaults and reasonable ranges of all arguments can be found in
defaults_analyze.
analyze(
x,
samplingRate = NULL,
dynamicRange = 80,
silence = 0.04,
scale = NULL,
SPL_measured = 70,
Pref = 2e-05,
windowLength = 50,
step = NULL,
overlap = 50,
wn = "gaussian",
zp = 0,
cutFreq = NULL,
formants = list(verify = FALSE),
nFormants = 3,
pitchMethods = c("dom", "autocor"),
pitchManual = NULL,
entropyThres = 0.6,
pitchFloor = 75,
pitchCeiling = 3500,
priorMean = 300,
priorSD = 6,
nCands = 1,
minVoicedCands = NULL,
pitchDom = list(),
pitchAutocor = list(),
pitchCep = list(),
pitchSpec = list(),
pitchHps = list(),
harmHeight = list(type = "n"),
shortestSyl = 20,
shortestPause = 60,
interpolWin = 75,
interpolTol = 0.3,
interpolCert = 0.3,
pathfinding = c("none", "fast", "slow")[2],
annealPars = list(maxit = 5000, temp = 1000),
certWeight = 0.5,
snakeStep = 0.05,
snakePlot = FALSE,
smooth = 1,
smoothVars = c("pitch", "dom"),
summary = FALSE,
summaryFun = c("mean", "median", "sd"),
invalidArgAction = c("adjust", "abort", "ignore")[1],
plot = TRUE,
showLegend = TRUE,
savePath = NA,
osc = TRUE,
osc_dB = FALSE,
pitchPlot = list(col = rgb(0, 0, 1, 0.75), lwd = 3, showPrior = TRUE),
ylim = NULL,
xlab = "Time, ms",
ylab = "kHz",
main = NULL,
width = 900,
height = 500,
units = "px",
res = NA,
...
)
path to a .wav or .mp3 file or a vector of amplitudes with specified samplingRate
sampling rate of x
(only needed if x
is a
numeric vector, rather than an audio file)
dynamic range, dB. All values more than one dynamicRange under maximum are treated as zero
(0 to 1) frames with RMS amplitude below silence threshold are not analyzed at all. NB: this number is dynamically updated: the actual silence threshold may be higher depending on the quietest frame, but it will never be lower than this specified number.
maximum possible amplitude of input used for normalization of input vector (not needed if input is an audio file)
sound pressure level at which the sound is presented, dB (set to 0 to skip analyzing subjective loudness)
reference pressure, Pa
length of FFT window, ms
you can override overlap
by specifying FFT step, ms
overlap between successive FFT frames, %
window type: gaussian, hanning, hamming, bartlett, rectangular, blackman, flattop
window length after zero padding, points
if specified, spectral descriptives (peakFreq, specCentroid,
specSlope, and quartiles) are calculated under cutFreq
. Recommended
when analyzing recordings with varying sampling rates: set to half the
lowest sampling rate to make the spectra more comparable. Note that
"entropyThres" applies only to this frequency range, which also affects
which frames will not be analyzed with pitchAutocor.
a list of arguments passed to
findformants
- an external function called to
perform LPC analysis
the number of formants to extract per STFT frame (0 = no formant analysis)
methods of pitch estimation to consider for determining pitch contour: 'autocor' = autocorrelation (~PRAAT), 'cep' = cepstral, 'spec' = spectral (~BaNa), 'dom' = lowest dominant frequency band ('' or NULL = no pitch analysis)
manually corrected pitch contour - a numeric vector of any
length, but ideally as returned by pitch_app
with the same
windowLength and step as in current call to analyze
pitch tracking is not performed for frames with Weiner
entropy above entropyThres
, but other spectral descriptives are
still calculated
absolute bounds for pitch candidates (Hz)
specifies the mean (Hz) and standard deviation
(semitones) of gamma distribution describing our prior knowledge about the
most likely pitch values for this file. For ex., priorMean = 300,
priorSD = 6
gives a prior with mean = 300 Hz and SD = 6 semitones (half
an octave)
maximum number of pitch candidates per method (except for
dom
, which returns at most one candidate per frame), normally 1...4
minimum number of pitch candidates that have to be
defined to consider a frame voiced (if NULL, defaults to 2 if dom
is
among other candidates and 1 otherwise)
a list of control parameters for pitch tracking using the
lowest dominant frequency band or "dom" method; see details and
?soundgen:::getDom
a list of control parameters for pitch tracking using the
autocorrelation or "autocor" method; see details and
?soundgen:::getPitchAutocor
a list of control parameters for pitch tracking using the
cepstrum or "cep" method; see details and ?soundgen:::getPitchCep
a list of control parameters for pitch tracking using the
BaNa or "spec" method; see details and ?soundgen:::getPitchSpec
a list of control parameters for pitch tracking using the
harmonic product spectrum ("hps") method; see details and
?soundgen:::getPitchHps
a list of control parameters for estimating how high
harmonics reach in the spectrum; see details and ?soundgen:::harmHeight
the smallest length of a voiced segment (ms) that constitutes a voiced syllable (shorter segments will be replaced by NA, as if unvoiced)
the smallest gap between voiced syllables (ms) that means they shouldn't be merged into one voiced syllable
control the behavior of
interpolation algorithm when postprocessing pitch candidates. To turn off
interpolation, set interpolWin = 0
. See soundgen:::pathfinder
for details.
method of finding the optimal path through pitch
candidates: 'none' = best candidate per frame, 'fast' = simple heuristic,
'slow' = annealing. See soundgen:::pathfinder
a list of control parameters for postprocessing of
pitch contour with SANN algorithm of optim
. This is
only relevant if pathfinding = 'slow'
(0 to 1) in pitch postprocessing, specifies how much we prioritize the certainty of pitch candidates vs. pitch jumps / the internal tension of the resulting pitch curve
optimized path through pitch candidates is further
processed to minimize the elastic force acting on pitch contour. To
disable, set snakeStep = 0
if TRUE, plots the snake
if smooth
is a positive number, outliers of
the variables in smoothVars
are adjusted with median smoothing.
smooth
of 1 corresponds to a window of ~100 ms and tolerated
deviation of ~4 semitones. To disable, set smooth = 0
if TRUE, returns only a summary of the measured acoustic variables (mean, median and SD). If FALSE, returns a list containing frame-by-frame values
a vector of names of functions used to summarize each acoustic characteristic
what to do if an argument is invalid or outside the
range in defaults_analyze
: 'adjust' = reset to default value,
'abort' = stop execution, 'ignore' = throw a warning and continue (may
crash)
if TRUE, produces a spectrogram with pitch contour overlaid
if TRUE, adds a legend with pitch tracking methods
if a valid path is specified, a plot is saved in this folder (defaults to NA)
should an oscillogram be shown under the spectrogram? TRUE/
FALSE. If `osc_dB`, the oscillogram is displayed on a dB scale. See
osc_dB
for details
should an oscillogram be shown under the spectrogram? TRUE/
FALSE. If `osc_dB`, the oscillogram is displayed on a dB scale. See
osc_dB
for details
a list of graphical parameters for displaying the final
pitch contour. Set to list(type = 'n')
to suppress
frequency range to plot, kHz (defaults to 0 to Nyquist frequency)
plotting parameters
parameters passed to
png
if the plot is saved
other graphical parameters passed to spectrogram
If summary = TRUE
, returns a dataframe with one row and three
columns per acoustic variable (mean / median / SD). If summary =
FALSE
, returns a dataframe with one row per STFT frame and one column per
acoustic variable. The best guess at the pitch contour considering all
available information is stored in the variable called "pitch". In
addition, the output contains pitch estimates by separate algorithms
included in pitchMethods
and a number of other acoustic descriptors:
duration from the beginning of the first
non-silent STFT frame to the end of the last non-silent STFT frame, s (NB:
depends strongly on windowLength
and silence
settings)
time of the middle of each frame (ms)
root mean square of amplitude per frame, calculated as sqrt(mean(frame ^ 2))
the same as ampl for voiced frames and NA for unvoiced frames
lowest dominant frequency band (Hz) (see "Pitch tracking methods / Dominant frequency" in the vignette)
Weiner entropy of the spectrum of the current frame. Close to 0: pure tone or tonal sound with nearly all energy in harmonics; close to 1: white noise
the frequency and bandwidth of the first nFormants formants per STFT frame, as calculated by phonTools::findformants
the amount of energy in upper harmonics, namely the ratio of total spectral mass above 1.25 x F0 to the total spectral mass below 1.25 x F0 (dB)
how high harmonics reach in the spectrum, based on the best guess at pitch (or the manually provided pitch values)
harmonics-to-noise ratio (dB), a measure of harmonicity returned by soundgen:::getPitchAutocor (see "Pitch tracking methods / Autocorrelation"). If HNR = 0 dB, there is as much energy in harmonics as in noise
subjective loudness, in
sone, corresponding to the chosen SPL_measured - see
getLoudness
the frequency with maximum spectral power (Hz)
post-processed pitch contour based on all F0 estimates
autocorrelation estimate of F0
cepstral estimate of F0
BaNa estimate of F0
the 25th, 50th, and 75th quantiles of the spectrum of voiced frames (Hz)
the center of gravity of the frame<U+2019>s spectrum, first spectral moment (Hz)
the slope of linear regression fit to the spectrum below cutFreq
is the current STFT frame voiced? TRUE / FALSE
Each pitch tracker is controlled by its own list of settings, as follows:
pitchAutocor
(autocorrelation)autocorThres
voicing threshold (unitless, ~0 to 1)
autocorSmooth
the width of smoothing interval (in bins) for finding
peaks in the autocorrelation function. Defaults to 7 for sampling rate 44100
and smaller odd numbers for lower values of sampling rate
autocorUpsample
upsamples acf to this resolution (Hz) to improve
accuracy in high frequencies
autocorBestPeak
amplitude of the
lowest best candidate relative to the absolute max of the acf
pitchCep
(cepstrum)cepSmooth
the width of smoothing
interval (Hz) for finding peaks in the cepstrum
cepZp
zero-padding of the spectrum used for cepstral pitch detection (final length
of spectrum after zero-padding in points, e.g. 2 ^ 13)
pitchSpec
(ratio of harmonics - BaNa algorithm)specThres
voicing threshold (unitless, ~0 to 1)
specPeak,specHNRslope
when looking for putative harmonics in the
spectrum, the threshold for peak detection is calculated as specPeak *
(1 - HNR * specHNRslope)
specSmooth the width of window for detecting peaks in the spectrum, Hz
specMerge
pitch candidates within
specMerge
semitones are merged with boosted certainty
specSinglePeakCert
(0 to 1) if F0 is calculated based on a single
harmonic ratio (as opposed to several ratios converging on the same
candidate), its certainty is taken to be specSinglePeakCert
hpsThres
voicing
threshold (unitless, ~0 to 1)
hpsNorm
the amount of inflation of
hps pitch certainty (0 = none)
hpsPenalty
the amount of
penalizing hps candidates in low frequencies (0 = none)
Each of these
lists also accepts graphical parameters that affect how pitch candidates are
plotted, eg pitchDom = list(domThres = .5, col = 'yellow')
. Other
arguments that are lists of subroutine-specific settings include:
harmonicHeight
(finding how high harmonics reach in the
spectrum)harmPerSel
the number of harmonics per sliding selection
harmTol
maximum tolerated deviation of peak frequency from
multiples of f0, proportion of f0
+ plotting pars, notably set
type = 'l'
to plot the harmHeight
contour
analyzeFolder
pitch_app
getLoudness
segment
getRMS
modulationSpectrum
ssm
# NOT RUN {
sound = soundgen(sylLen = 300, pitch = c(500, 400, 600),
noise = list(time = c(0, 300), value = c(-40, 0)),
temperature = 0.001,
addSilence = 50) # NB: always have some silence before and after!!!
# playme(sound, 16000)
a = analyze(sound, samplingRate = 16000)
# }
# NOT RUN {
# For maximum processing speed (just basic spectral descriptives):
a = analyze(sound, samplingRate = 16000,
plot = FALSE, # no plotting
pitchMethods = NULL, # no pitch tracking
SPL_measured = 0, # no loudness analysis
nFormants = 0 # no formant analysis
)
sound1 = soundgen(sylLen = 900, pitch = list(
time = c(0, .3, .9, 1), value = c(300, 900, 400, 2300)),
noise = list(time = c(0, 300), value = c(-40, 0)),
temperature = 0.001, samplingRate = 44100)
# improve the quality of postprocessing:
a1 = analyze(sound1, samplingRate = 44100, priorSD = 24,
plot = TRUE, pathfinding = 'slow', ylim = c(0, 5))
median(a1$pitch, na.rm = TRUE)
# (can vary, since postprocessing is stochastic)
# compare to the true value:
median(getSmoothContour(anchors = list(time = c(0, .3, .8, 1),
value = c(300, 900, 400, 2300)), len = 1000))
# the same pitch contour, but harder to analyze b/c of
subharmonics and jitter
sound2 = soundgen(sylLen = 900, pitch = list(
time = c(0, .3, .8, 1), value = c(300, 900, 400, 2300)),
noise = list(time = c(0, 900), value = c(-40, -20)),
subDep = 10, jitterDep = 0.5,
temperature = 0.001, samplingRate = 44100)
# playme(sound2, 44100)
a2 = analyze(sound2, samplingRate = 44100, priorSD = 24,
pathfinding = 'slow', ylim = c(0, 5))
# Fancy plotting options:
a = analyze(sound1, samplingRate = 44100,
xlab = 'Time, ms', colorTheme = 'seewave',
contrast = .5, ylim = c(0, 4), main = 'My plot',
pitchMethods = c('dom', 'autocor', 'spec', 'hps', 'cep'),
priorMean = NA, # no prior info at all
pitchDom = list(col = 'red', domThres = .25),
pitchPlot = list(col = 'black', lty = 3, lwd = 3),
osc_dB = TRUE, heights = c(2, 1))
# Different formatting options for output
a = analyze(sound1, 44100, summary = FALSE) # frame-by-frame
a = analyze(sound1, 44100, summary = TRUE,
summaryFun = c('mean', 'range')) # one row per sound
# ...with custom summaryFun
difRan = function(x) diff(range(x))
a = analyze(sound2, samplingRate = 16000, summary = TRUE,
summaryFun = c('mean', 'difRan'))
# Save the plot
a = analyze(sound1, 44100, ylim = c(0, 5),
savePath = '~/Downloads/',
width = 20, height = 15, units = 'cm', res = 300)
## Amplitude and loudness: analyze() should give the same results as
dedicated functions getRMS() / getLoudness()
# Create 1 kHz tone
samplingRate = 16000; dur_ms = 50
sound3 = sin(2*pi*1000/samplingRate*(1:(dur_ms/1000*samplingRate)))
a1 = analyze(sound3, samplingRate = samplingRate, windowLength = 25,
overlap = 50, SPL_measured = 40, scale = 1,
pitchMethods = NULL, plot = FALSE)
a1$loudness # loudness per STFT frame (1 sone by definition)
getLoudness(sound3, samplingRate = samplingRate, windowLength = 25,
overlap = 50, SPL_measured = 40, scale = 1)$loudness
a1$ampl # RMS amplitude per STFT frame
getRMS(sound3, samplingRate = samplingRate, windowLength = 25,
overlap = 50, scale = 1)
# or even simply: sqrt(mean(sound1 ^ 2))
# The same sound as above, but with half the amplitude
a_half = analyze(sound3 / 2, samplingRate = samplingRate, windowLength = 25,
overlap = 50, SPL_measured = 40, scale = 1,
pitchMethods = NULL, plot = FALSE)
a1$ampl / a_half$ampl # rms amplitude halved
a1$loudness/ a_half$loudness # loudness is not a linear function of amplitude
# Amplitude & loudness of an existing audio file
sound4 = '~/Downloads/temp/cry_451_soundgen.wav'
a2 = analyze(sound4, windowLength = 25, overlap = 50, SPL_measured = 40)
apply(a2[, c('loudness', 'ampl')], 2, median, na.rm = TRUE)
median(getLoudness(sound4, windowLength = 25, overlap = 50,
SPL_measured = 40)$loudness)
# NB: not identical b/c analyze() doesn't consider very quiet frames
median(getRMS(sound4, windowLength = 25, overlap = 50, scale = 1))
# Analyzing ultrasounds (slow but possible, just adjust pitchCeiling)
s = soundgen(sylLen = 200, addSilence = 10,
pitch = c(25000, 35000, 30000),
formants = NA, rolloff = -12, rolloffKHz = 0,
pitchSamplingRate = 350000, samplingRate = 350000, windowLength = 5,
pitchCeiling = 45000, invalidArgAction = 'ignore')
# s is a bat-like ultrasound inaudible to humans
spectrogram(s, 350000, windowLength = 5)
a = analyze(s, 350000, pitchCeiling = 45000, priorMean = NA,
windowLength = 5, overlap = 0,
nFormants = 0, SPL_measured = 0)
# NB: ignore formants and loudness estimates for such non-human sounds
# }
Run the code above in your browser using DataLab