Acoustic analysis of all wav/mp3 files in a folder. See analyze
and vignette('acoustic_analysis', package = 'soundgen') for further details.
See pitch_app
for a more realistic workflow: extract manually
corrected pitch contours with pitch_app(), then run analyzeFolder() with
these manual contours
analyzeFolder(
myfolder,
htmlPlots = TRUE,
verbose = TRUE,
samplingRate = NULL,
dynamicRange = 80,
silence = 0.04,
SPL_measured = 70,
Pref = 2e-05,
windowLength = 50,
step = NULL,
overlap = 50,
wn = "gaussian",
zp = 0,
cutFreq = NULL,
formants = list(verify = FALSE),
nFormants = 3,
pitchMethods = c("dom", "autocor"),
pitchManual = NULL,
entropyThres = 0.6,
pitchFloor = 75,
pitchCeiling = 3500,
priorMean = 300,
priorSD = 6,
nCands = 1,
minVoicedCands = NULL,
pitchDom = list(),
pitchAutocor = list(),
pitchCep = list(),
pitchSpec = list(),
pitchHps = list(),
harmHeight = list(type = "n"),
shortestSyl = 20,
shortestPause = 60,
interpolWin = 75,
interpolTol = 0.3,
interpolCert = 0.3,
pathfinding = c("none", "fast", "slow")[2],
annealPars = list(maxit = 5000, temp = 1000),
certWeight = 0.5,
snakeStep = 0.05,
snakePlot = FALSE,
smooth = 1,
smoothVars = c("pitch", "dom"),
summary = TRUE,
summaryFun = c("mean", "median", "sd"),
plot = FALSE,
showLegend = TRUE,
savePlots = FALSE,
pitchPlot = list(col = rgb(0, 0, 1, 0.75), lwd = 3, showPrior = TRUE),
ylim = NULL,
xlab = "Time, ms",
ylab = "kHz",
main = NULL,
width = 900,
height = 500,
units = "px",
res = NA,
...
)
full path to target folder
if TRUE, saves an html file with clickable plots
if TRUE, reports progress and estimated time left
sampling rate of x
(only needed if x
is a
numeric vector, rather than an audio file)
dynamic range, dB. All values more than one dynamicRange under maximum are treated as zero
(0 to 1) frames with RMS amplitude below silence threshold are not analyzed at all. NB: this number is dynamically updated: the actual silence threshold may be higher depending on the quietest frame, but it will never be lower than this specified number.
sound pressure level at which the sound is presented, dB (set to 0 to skip analyzing subjective loudness)
reference pressure, Pa
length of FFT window, ms
you can override overlap
by specifying FFT step, ms
overlap between successive FFT frames, %
window type: gaussian, hanning, hamming, bartlett, rectangular, blackman, flattop
window length after zero padding, points
if specified, spectral descriptives (peakFreq, specCentroid,
specSlope, and quartiles) are calculated under cutFreq
. Recommended
when analyzing recordings with varying sampling rates: set to half the
lowest sampling rate to make the spectra more comparable. Note that
"entropyThres" applies only to this frequency range, which also affects
which frames will not be analyzed with pitchAutocor.
a list of arguments passed to
findformants
- an external function called to
perform LPC analysis
the number of formants to extract per STFT frame (0 = no formant analysis)
methods of pitch estimation to consider for determining pitch contour: 'autocor' = autocorrelation (~PRAAT), 'cep' = cepstral, 'spec' = spectral (~BaNa), 'dom' = lowest dominant frequency band ('' or NULL = no pitch analysis)
normally the output of pitch_app
containing
a manually corrected pitch contour, ideally with the same windowLength and
step as current call to analyzeFolder; a dataframe with at least two
columns: "file" (w/o path) and "pitch" (character like "NA, 150, 175, NA")
pitch tracking is not performed for frames with Weiner
entropy above entropyThres
, but other spectral descriptives are
still calculated
absolute bounds for pitch candidates (Hz)
absolute bounds for pitch candidates (Hz)
specifies the mean (Hz) and standard deviation
(semitones) of gamma distribution describing our prior knowledge about the
most likely pitch values for this file. For ex., priorMean = 300,
priorSD = 6
gives a prior with mean = 300 Hz and SD = 6 semitones (half
an octave)
specifies the mean (Hz) and standard deviation
(semitones) of gamma distribution describing our prior knowledge about the
most likely pitch values for this file. For ex., priorMean = 300,
priorSD = 6
gives a prior with mean = 300 Hz and SD = 6 semitones (half
an octave)
maximum number of pitch candidates per method (except for
dom
, which returns at most one candidate per frame), normally 1...4
minimum number of pitch candidates that have to be
defined to consider a frame voiced (if NULL, defaults to 2 if dom
is
among other candidates and 1 otherwise)
a list of control parameters for pitch tracking using the
lowest dominant frequency band or "dom" method; see details and
?soundgen:::getDom
a list of control parameters for pitch tracking using the
autocorrelation or "autocor" method; see details and
?soundgen:::getPitchAutocor
a list of control parameters for pitch tracking using the
cepstrum or "cep" method; see details and ?soundgen:::getPitchCep
a list of control parameters for pitch tracking using the
BaNa or "spec" method; see details and ?soundgen:::getPitchSpec
a list of control parameters for pitch tracking using the
harmonic product spectrum ("hps") method; see details and
?soundgen:::getPitchHps
a list of control parameters for estimating how high
harmonics reach in the spectrum; see details and ?soundgen:::harmHeight
the smallest length of a voiced segment (ms) that constitutes a voiced syllable (shorter segments will be replaced by NA, as if unvoiced)
the smallest gap between voiced syllables (ms) that means they shouldn't be merged into one voiced syllable
control the behavior of
interpolation algorithm when postprocessing pitch candidates. To turn off
interpolation, set interpolWin = 0
. See soundgen:::pathfinder
for details.
control the behavior of
interpolation algorithm when postprocessing pitch candidates. To turn off
interpolation, set interpolWin = 0
. See soundgen:::pathfinder
for details.
control the behavior of
interpolation algorithm when postprocessing pitch candidates. To turn off
interpolation, set interpolWin = 0
. See soundgen:::pathfinder
for details.
method of finding the optimal path through pitch
candidates: 'none' = best candidate per frame, 'fast' = simple heuristic,
'slow' = annealing. See soundgen:::pathfinder
a list of control parameters for postprocessing of
pitch contour with SANN algorithm of optim
. This is
only relevant if pathfinding = 'slow'
(0 to 1) in pitch postprocessing, specifies how much we prioritize the certainty of pitch candidates vs. pitch jumps / the internal tension of the resulting pitch curve
optimized path through pitch candidates is further
processed to minimize the elastic force acting on pitch contour. To
disable, set snakeStep = 0
if TRUE, plots the snake
if smooth
is a positive number, outliers of
the variables in smoothVars
are adjusted with median smoothing.
smooth
of 1 corresponds to a window of ~100 ms and tolerated
deviation of ~4 semitones. To disable, set smooth = 0
if smooth
is a positive number, outliers of
the variables in smoothVars
are adjusted with median smoothing.
smooth
of 1 corresponds to a window of ~100 ms and tolerated
deviation of ~4 semitones. To disable, set smooth = 0
if TRUE, returns only a summary of the measured acoustic variables (mean, median and SD). If FALSE, returns a list containing frame-by-frame values
a vector of names of functions used to summarize each acoustic characteristic
if TRUE, produces a spectrogram with pitch contour overlaid
if TRUE, adds a legend with pitch tracking methods
if TRUE, saves plots as .png files
a list of graphical parameters for displaying the final
pitch contour. Set to list(type = 'n')
to suppress
frequency range to plot, kHz (defaults to 0 to Nyquist frequency)
plotting parameters
plotting parameters
plotting parameters
parameters passed to
png
if the plot is saved
parameters passed to
png
if the plot is saved
parameters passed to
png
if the plot is saved
parameters passed to
png
if the plot is saved
other graphical parameters passed to spectrogram
If summary
is TRUE, returns a dataframe with one row per audio
file. If summary
is FALSE, returns a list of detailed descriptives.
# NOT RUN {
# download 260 sounds from Anikin & Persson (2017)
# http://cogsci.se/publications/anikin-persson_2017_nonlinguistic-vocs/260sounds_wav.zip
# unzip them into a folder, say '~/Downloads/temp'
myfolder = '~/Downloads/temp' # 260 .wav files live here
s = analyzeFolder(myfolder, verbose = TRUE) # ~ 10-20 minutes!
# s = write.csv(s, paste0(myfolder, '/temp.csv')) # save a backup
# Check accuracy: import manually verified pitch values (our "key")
# pitchManual # "ground truth" of mean pitch per sound
# pitchContour # "ground truth" of complete pitch contours per sound
files_manual = paste0(names(pitchManual), '.wav')
idx = match(s$file, files_manual) # in case the order is wrong
s$key = pitchManual[idx]
# Compare manually verified mean pitch with the output of analyzeFolder:
cor(s$key, s$pitch_median, use = 'pairwise.complete.obs')
plot(s$key, s$pitch_median, log = 'xy')
abline(a=0, b=1, col='red')
# Re-running analyzeFolder with manually corrected contours gives correct
pitch-related descriptives like amplVoiced and harmonics (NB: you get it "for
free" when running pitch_app)
s1 = analyzeFolder(myfolder, verbose = TRUE, pitchManual = pitchContour)
plot(s$harmonics_median, s1$harmonics_median)
abline(a=0, b=1, col='red')
# Save spectrograms with pitch contours plus an html file for easy access
s2 = analyzeFolder('~/Downloads/temp', savePlots = TRUE,
showLegend = TRUE, pitchManual = pitchContour,
width = 20, height = 12,
units = 'cm', res = 300, ylim = c(0, 5))
# }
Run the code above in your browser using DataLab