(Experimental) A function for automatically detecting and annotating
nonlinear vocal phenomena (NLP). Algorithm: analyze the audio using
analyze
and phasegram
, then use the extracted
frame-by-frame descriptives to classify each frame as having no NLP ("none"),
subharmonics ("sh"), sibebands / amplitude modulation ("sb"), or
deterministic chaos ("chaos"). The classification is performed by a
naiveBayes
algorithm adapted to autocorrelated time series and
pretrained on a manually annotated corpus of vocalizations. Whenever
possible, check and correct pitch tracks prior to running the algorithm. See
naiveBayes
for tips on using adaptive priors and "clumpering"
to account for the fact that NLP typically occur in continuous segments
spanning multiple frames.
detectNLP(
x,
samplingRate = NULL,
predictors = c("nPeaks", "d2", "subDep", "amEnvDep", "entropy", "HNR", "CPP",
"roughness"),
thresProb = 0.4,
unvoicedToNone = FALSE,
train = soundgen::detectNLP_training_nonv,
scale = NULL,
from = NULL,
to = NULL,
pitchManual = NULL,
pars_analyze = list(windowLength = 50, roughness = list(windowLength = 15, step = 3)),
pars_phasegram = list(nonlinStats = "d2"),
pars_naiveBayes = list(prior = "static", wlClumper = 3),
jumpThres = 14,
jumpWindow = 100,
reportEvery = NULL,
cores = 1,
plot = FALSE,
savePlots = NULL,
main = NULL,
xlab = NULL,
ylab = NULL,
ylim = NULL,
width = 900,
height = 500,
units = "px",
res = NA,
...
)
Returns a dataframe with frame-by-frame descriptives, posterior probabilities of each NLP type per frame, and the tentative classification (the NLP type with the highest posterior probability, possibly corrected by clumpering). The time step is equal to the larger of the steps passed to analyze() and phasegram().
Returns a list of datasets, one per input file, with acoustic
descriptives per frame (returned by analyze
and phasegram
),
probabilities of each NLP type per frame, and the putative classification
of NLP per frame.
path to a folder, one or more wav or mp3 files c('file1.wav', 'file2.mp3'), Wave object, numeric vector, or a list of Wave objects or numeric vectors
sampling rate of x
(only needed if x
is a
numeric vector)
variables to include in NLP classification. The default is to include all 7 variables in the training corpus. NA values are fine (they do not cause the entire frame to be dropped as long as at least one variable is measured).
minimum probability of NLP for the frame to be classified as non-"none", which is good for reducing false alarms (<1/nClasses means just go for the highest probability)
if TRUE, frames treated as unvoiced are set to "none" (mostly makes sense with manual pitch tracking)
training corpus, namely the result of running
naiveBayes_train
on audio with known NLP episodes. Currently
implemented: soundgen::detectNLP_training_nonv = manually annotated human
nonverbal vocalizations, soundgen::detectNLP_training_synth = synthetic,
soundgen()-generated sounds with various NLP. To train your own, run
detectNLP
on a collection of recordings, provide ground truth
classification of NLP per frame (normally this would be converted from NLP
annotations), and run naiveBayes_train
.
maximum possible amplitude of input used for normalization of
input vector (only needed if x
is a numeric vector)
if NULL (default), analyzes the whole sound, otherwise from...to (s)
manually corrected pitch contour. For a single sound,
provide a numeric vector of any length. For multiple sounds, provide a
dataframe with columns "file" and "pitch" (or path to a csv file) as
returned by pitch_app
, ideally with the same windowLength and
step as in current call to analyze. A named list with pitch vectors per
file is also OK (eg as returned by pitch_app)
arguments passed to analyze
. NB: drop
everything unnecessary to speed up the process, e.g. nFormants = 0,
loudness = NULL, etc. If you have manual pitch contours, pass them as
pitchManual = ...
. Make sure the "silence" threshold is appropriate,
and ideally normalize the audio (silent frames are automatically assigned
to "none")
arguments passed to phasegram
. NB: only
d2
and nPeaks
are used for NLP detection because they proved
effective in the training corpus; other nonlinear statistics are not
calculated to save time.
arguments passed to naiveBayes
. It is
strongly recommended to use some clumpering, with wlClumper
given as
frames (multiple by step
to get the corresponding minumum duration
of an NLP segment in ms), and/or dynamic priors.
frames in which pitch changes by jumpThres
octaves/s
more than in the surrounding frames are classified as containing "pitch
jumps". Note that this is the rate of frequency change PER SECOND, not from
one frame to the next
the window for calculating the median pitch slope around the analyzed frame, ms
when processing multiple inputs, report estimated time left every ... iterations (NULL = default, NA = don't report)
number of cores for parallel processing
if TRUE, produces a spectrogram with annotated NLP regimes
full path to the folder in which to save the plots (NULL = don't save, '' = same folder as audio)
graphical parameters passed to
spectrogram
frequency range to plot, kHz (defaults to 0 to Nyquist frequency). NB: still in kHz, even if yScale = bark, mel, or ERB
parameters passed to
png
if the plot is saved
if (FALSE) {
target = soundgen(sylLen = 1600, addSilence = 0, temperature = 1e-6,
pitch = c(380, 550, 500, 220), subDep = c(0, 0, 40, 0, 0, 0, 0, 0),
amDep = c(0, 0, 0, 0, 80, 0, 0, 0), amFreq = 80,
noise = c(-10, rep(-40, 5)),
jitterDep = c(0, 0, 0, 0, 0, 3))
# classifier trained on manually annotated recordings of human nonverbal
# vocalizations
nlp = detectNLP(target, 16000, plot = TRUE, ylim = c(0, 4))
# classifier trained on synthetic, soundgen()-generated sounds
nlp = detectNLP(target, 16000, train = soundgen::detectNLP_training_synth,
plot = TRUE, ylim = c(0, 4))
head(nlp[, c('time', 'pr')])
table(nlp$pr)
plot(nlp$amEnvDep, type = 'l')
plot(nlp$subDep, type = 'l')
plot(nlp$entropy, type = 'l')
plot(nlp$none, type = 'l')
points(nlp$sb, type = 'l', col = 'blue')
points(nlp$sh, type = 'l', col = 'green')
points(nlp$chaos, type = 'l', col = 'red')
# detection of pitch jumps
s1 = soundgen(sylLen = 1200, temperature = .001, pitch = list(
time = c(0, 350, 351, 890, 891, 1200),
value = c(140, 230, 460, 330, 220, 200)))
playme(s1, 16000)
detectNLP(s1, 16000, plot = TRUE, ylim = c(0, 3))
# process all files in a folder
nlp = detectNLP('/home/allgoodguys/Downloads/temp260/',
pitchManual = soundgen::pitchContour, cores = 4, plot = TRUE,
savePlots = '', ylim = c(0, 3))
}
Run the code above in your browser using DataLab