Acoustic analysis of a single sound file: pitch tracking, basic spectral
characteristics, and estimated loudness (see getLoudness
). The
default values of arguments are optimized for human non-linguistic
vocalizations. See vignette('acoustic_analysis', package = 'soundgen') for
details.
analyze(x, samplingRate = NULL, dynamicRange = 80, silence = 0.04,
SPL_measured = 70, Pref = 20, windowLength = 50, step = NULL,
overlap = 50, wn = "gaussian", zp = 0, cutFreq = 6000,
nFormants = 3, pitchMethods = c("autocor", "spec", "dom"),
entropyThres = 0.6, pitchFloor = 75, pitchCeiling = 3500,
priorMean = HzToSemitones(300), priorSD = 6, priorPlot = FALSE,
nCands = 1, minVoicedCands = "autom", domThres = 0.1,
domSmooth = 220, autocorThres = 0.7, autocorSmooth = NULL,
cepThres = 0.3, cepSmooth = NULL, cepZp = 0, specThres = 0.3,
specPeak = 0.35, specSinglePeakCert = 0.4, specHNRslope = 0.8,
specSmooth = 150, specMerge = 1, shortestSyl = 20,
shortestPause = 60, interpolWin = 3, interpolTol = 0.3,
interpolCert = 0.3, pathfinding = c("none", "fast", "slow")[2],
annealPars = list(maxit = 5000, temp = 1000), certWeight = 0.5,
snakeStep = 0.05, snakePlot = FALSE, smooth = 1,
smoothVars = c("pitch", "dom"), summary = FALSE,
summaryStats = c("mean", "median", "sd"), plot = TRUE,
showLegend = TRUE, savePath = NA, plotSpec = TRUE,
pitchPlot = list(col = rgb(0, 0, 1, 0.75), lwd = 3),
candPlot = list(), ylim = NULL, xlab = "Time, ms", ylab = "kHz",
main = NULL, width = 900, height = 500, units = "px", res = NA,
...)
path to a .wav or .mp3 file or a vector of amplitudes with specified samplingRate
sampling rate of x
(only needed if
x
is a numeric vector, rather than an audio file)
dynamic range, dB. All values more than one dynamicRange under maximum are treated as zero
(0 to 1) frames with RMS amplitude below silence threshold are not analyzed at all. NB: this number is dynamically updated: the actual silence threshold may be higher depending on the quietest frame, but it will never be lower than this specified number.
sound pressure level at which the sound is presented, dB
reference pressure, Pa
length of FFT window, ms
you can override overlap
by specifying FFT step, ms
overlap between successive FFT frames, %
window type: gaussian, hanning, hamming, bartlett, rectangular, blackman, flattop
window length after zero padding, points
(>0 to Nyquist, Hz) repeat the calculation of spectral
descriptives after discarding all info above cutFreq
.
Recommended if the original sampling rate varies across different analyzed
audio files
the number of formants to extract per FFT frame. Calls
findformants
with default settings
methods of pitch estimation to consider for determining pitch contour: 'autocor' = autocorrelation (~PRAAT), 'cep' = cepstral, 'spec' = spectral (~BaNa), 'dom' = lowest dominant frequency band
pitch tracking is not performed for frames with Weiner
entropy above entropyThres
, but other spectral descriptives are
still calculated
absolute bounds for pitch candidates (Hz)
specifies the mean and sd of gamma distribution
describing our prior knowledge about the most likely pitch values for this
file. Specified in semitones: priorMean = HzToSemitones(300),
priorSD = 6
gives a prior with mean = 300 Hz and SD of 6 semitones (half
an octave)
if TRUE, produces a separate plot of the prior
maximum number of pitch candidates per method (except for
dom
, which returns at most one candidate per frame), normally 1...4
minimum number of pitch candidates that
have to be defined to consider a frame voiced (defaults to 2 if dom
is among other candidates and 1 otherwise)
(0 to 1) to find the lowest dominant frequency band, we do short-term FFT and take the lowest frequency with amplitude at least domThres
the width of smoothing interval (Hz) for finding
dom
(0 to 1) separate voicing thresholds for detecting pitch candidates with three different methods: autocorrelation, cepstrum, and BaNa algorithm (see Details). Note that HNR is calculated even for unvoiced frames.
the width of smoothing interval (in bins) for finding peaks in the autocorrelation function. Defaults to 7 for sampling rate 44100 and smaller odd numbers for lower values of sampling rate
the width of smoothing interval (in bins) for finding peaks in the cepstrum. Defaults to 31 for sampling rate 44100 and smaller odd numbers for lower values of sampling rate
zero-padding of the spectrum used for cepstral pitch detection (final length of spectrum after zero-padding in points, e.g. 2 ^ 13)
when looking for putative harmonics in
the spectrum, the threshold for peak detection is calculated as
specPeak * (1 - HNR * specHNRslope)
(0 to 1) if F0 is calculated based on a single
harmonic ratio (as opposed to several ratios converging on the same
candidate), its certainty is taken to be specSinglePeakCert
the width of window for detecting peaks in the spectrum, Hz
pitch candidates within specMerge
semitones are
merged with boosted certainty
the smallest length of a voiced segment (ms) that constitutes a voiced syllable (shorter segments will be replaced by NA, as if unvoiced)
the smallest gap between voiced syllables (ms) that means they shouldn't be merged into one voiced syllable
control the behavior of
interpolation algorithm when postprocessing pitch candidates. To turn off
interpolation, set interpolWin
to NULL. See
soundgen:::pathfinder
for details.
method of finding the optimal path through pitch
candidates: 'none' = best candidate per frame, 'fast' = simple heuristic,
'slow' = annealing. See soundgen:::pathfinder
a list of control parameters for postprocessing of
pitch contour with SANN algorithm of optim
. This is
only relevant if pathfinding = 'slow'
(0 to 1) in pitch postprocessing, specifies how much we prioritize the certainty of pitch candidates vs. pitch jumps / the internal tension of the resulting pitch curve
optimized path through pitch candidates is further
processed to minimize the elastic force acting on pitch contour. To
disable, set snakeStep
to NULL
if TRUE, plots the snake
if smooth
is a positive number, outliers of
the variables in smoothVars
are adjusted with median smoothing.
smooth
of 1 corresponds to a window of ~100 ms and tolerated
deviation of ~4 semitones. To disable, set smooth
to NULL
if TRUE, returns only a summary of the measured acoustic variables (mean, median and SD). If FALSE, returns a list containing frame-by-frame values
a vector of names of functions used to summarize each acoustic characteristic
if TRUE, produces a spectrogram with pitch contour overlaid
if TRUE, adds a legend with pitch tracking methods
if a valid path is specified, a plot is saved in this folder (defaults to NA)
if FALSE
, the spectrogram will not be plotted
a list of graphical parameters for displaying the final
pitch contour. Set to NULL
or NA
to suppress
a list of graphical parameters for displaying
individual pitch candidates. Set to NULL
or NA
to suppress
frequency range to plot, kHz (defaults to 0 to Nyquist frequency)
plotting parameters
parameters passed to
png
if the plot is saved
other graphical parameters passed to spectrogram
If summary = TRUE
, returns a dataframe with one row and three
columns per acoustic variable (mean / median / SD). If summary =
FALSE
, returns a dataframe with one row per FFT frame and one column per
acoustic variable. The best guess at the pitch contour considering all
available information is stored in the variable called "pitch". In
addition, the output contains pitch estimates by separate algorithms
included in pitchMethods
and a number of other acoustic descriptors:
root mean square of amplitude per frame, calculated as sqrt(mean(frame ^ 2))
the same as ampl for voiced frames and NA for unvoiced frames
lowest dominant frequency band (Hz) (see <U+201C>Pitch tracking methods / Dominant frequency<U+201D> in the vignette)
Weiner entropy of the spectrum of the current frame. Close to 0: pure tone or tonal sound with nearly all energy in harmonics; close to 1: white noise
the frequency and bandwidth of the first nFormants formants per FFT frame, as calculated by phonTools::findformants with default settings
the amount of energy in upper harmonics, namely the ratio of total spectral mass above 1.25 x F0 to the total spectral mass below 1.25 x F0 (dB)
harmonics-to-noise ratio (dB), a measure of harmonicity returned by soundgen:::getPitchAutocor (see <U+201C>Pitch tracking methods / Autocorrelation<U+201D>). If HNR = 0 dB, there is as much energy in harmonics as in noise
subjective loudness, in sone, corresponding to
the chosen SPL_measured - see getLoudness
50th quantile of the frame's spectrum
the frequency with maximum spectral power (Hz)
the frequency with maximum spectral power below cutFreq (Hz)
post-processed pitch contour based on all F0 estimates
autocorrelation estimate of F0
cepstral estimate of F0
BaNa estimate of F0
the 25th, 50th, and 75th quantiles of the spectrum below cutFreq (Hz)
the center of gravity of the frame<U+2019>s spectrum, first spectral moment (Hz)
the center of gravity of the frame<U+2019>s spectrum below cutFreq
the slope of linear regression fit to the spectrum below cutFreq
is the current FFT frame voiced? TRUE / FALSE
# NOT RUN {
sound = soundgen(sylLen = 300, pitch = c(900, 400, 2300),
noise = list(time = c(0, 300), value = c(-40, 00)),
temperature = 0.001, addSilence = 0)
# playme(sound, 16000)
a = analyze(sound, samplingRate = 16000, plot = TRUE)
# }
# NOT RUN {
sound1 = soundgen(sylLen = 900, pitch = list(
time = c(0, .3, .9, 1), value = c(300, 900, 400, 2300)),
noise = list(time = c(0, 300), value = c(-40, 00)),
temperature = 0.001, addSilence = 0)
# improve the quality of postprocessing:
a1 = analyze(sound1, samplingRate = 16000, plot = TRUE, pathfinding = 'slow')
median(a1$pitch, na.rm = TRUE)
# (can vary, since postprocessing is stochastic)
# compare to the true value:
median(getSmoothContour(anchors = list(time = c(0, .3, .8, 1),
value = c(300, 900, 400, 2300)), len = 1000))
# the same pitch contour, but harder b/c of subharmonics and jitter
sound2 = soundgen(sylLen = 900, pitch = list(
time = c(0, .3, .8, 1), value = c(300, 900, 400, 2300)),
noise = list(time = c(0, 900), value = c(-40, 20)),
subDep = 100, jitterDep = 0.5, nonlinBalance = 100, temperature = 0.001)
# playme(sound2, 16000)
a2 = analyze(sound2, samplingRate = 16000, plot = TRUE, pathfinding = 'slow')
# many candidates are off, but the overall contour should be mostly accurate
# Fancy plotting options:
a = analyze(sound2, samplingRate = 16000, plot = TRUE,
xlab = 'Time, ms', colorTheme = 'seewave',
contrast = .5, ylim = c(0, 4),
pitchMethods = c('dom', 'autocor', 'spec'),
candPlot = list(
col = c('gray70', 'yellow', 'purple'), # same order as pitchMethods
pch = c(1, 3, 5),
cex = 3),
pitchPlot = list(col = 'black', lty = 3, lwd = 3))
# Plot pitch candidates w/o a spectrogram
a = analyze(sound2, samplingRate = 16000, plot = TRUE, plotSpec = FALSE)
# Different formatting options for output
a = analyze(sound2, samplingRate = 16000, summary = FALSE) # frame-by-frame
a = analyze(sound2, samplingRate = 16000, summary = TRUE,
summaryStats = c('mean', 'range')) # one row per sound
# Save the plot
a = analyze(sound, samplingRate = 16000,
savePath = '~/Downloads/',
width = 20, height = 15, units = 'cm', res = 300)
# }
Run the code above in your browser using DataLab