Learn R Programming

soundgen (version 1.0.0)

analyzeFolder: Analyze sound

Description

Acoustic analysis of all .wav files in a folder.

Usage

analyzeFolder(myfolder, verbose = TRUE, samplingRate = NULL,
  silence = 0.04, windowLength = 50, step = NULL, overlap = 50,
  wn = "gaussian", zp = 0, cutFreq = 6000, nFormants = 3,
  pitchMethods = c("autocor", "spec", "dom"), entropyThres = 0.6,
  pitchFloor = 75, pitchCeiling = 3500, priorMean = HzToSemitones(300),
  priorSD = 6, priorPlot = FALSE, nCands = 1, minVoicedCands = "autom",
  domThres = 0.1, domSmooth = 220, autocorThres = 0.7,
  autocorSmooth = NULL, cepThres = 0.3, cepSmooth = NULL, cepZp = 0,
  specThres = 0.3, specPeak = 0.35, specSinglePeakCert = 0.4,
  specHNRslope = 0.8, specSmooth = 150, specMerge = 1, shortestSyl = 20,
  shortestPause = 60, interpolWin = 3, interpolTol = 0.3,
  interpolCert = 0.3, pathfinding = c("none", "fast", "slow")[2],
  annealPars = list(maxit = 5000, temp = 1000), certWeight = 0.5,
  snakeStep = 0.05, snakePlot = FALSE, smooth = 1,
  smoothVars = c("pitch", "dom"), summary = TRUE, plot = FALSE,
  savePath = NA, specPlot = list(contrast = 0.2, brightness = 0, ylim = c(0,
  5)), pitchPlot = list(col = rgb(0, 0, 1, 0.75), lwd = 3),
  candPlot = list(levels = c("autocor", "spec", "dom", "cep"), col =
  c("green", "red", "orange", "violet"), pch = c(16, 2, 3, 7), cex = 2))

Arguments

myfolder

full path to target folder

verbose

if TRUE, reports progress and estimated time left

samplingRate

sampling rate of x (only needed if x is a numeric vector, rather than a .wav file)

silence

(0 to 1) frames with mean abs amplitude below silence threshold are not analyzed at all. NB: this number is dynamically updated: the actual silence threshold may be higher depending on the quietest frame, but it will never be lower than this specified number.

windowLength

length of FFT window, ms

step

you can override overlap by specifying FFT step, ms

overlap

overlap between successive FFT frames, %

wn

window type: gaussian, hanning, hamming, bartlett, rectangular, blackman, flattop

zp

window length after zero padding, points

cutFreq

(>0 to Nyquist, Hz) repeat the calculation of spectral descriptives after discarding all info above cutFreq. Recommended if the original sampling rate varies across different analyzed audio files

nFormants

the number of formants to extract per FFT frame. Calls findformants with default settings

pitchMethods

methods of pitch estimation to consider for determining pitch contour: 'autocor' = autocorrelation (~PRAAT), 'cep' = cepstral, 'spec' = spectral (~BaNa), 'dom' = lowest dominant frequency band

entropyThres

pitch tracking is not performed for frames with Weiner entropy above entropyThres, but other spectral descriptives are still calculated

pitchFloor

absolute bounds for pitch candidates (Hz)

pitchCeiling

absolute bounds for pitch candidates (Hz)

priorMean

specifies the mean and sd of gamma distribution describing our prior knowledge about the most likely pitch values for this file. Specified in semitones: priorMean = HzToSemitones(300), priorSD = 6 gives a prior with mean = 300 Hz and SD of 6 semitones (half an octave)

priorSD

specifies the mean and sd of gamma distribution describing our prior knowledge about the most likely pitch values for this file. Specified in semitones: priorMean = HzToSemitones(300), priorSD = 6 gives a prior with mean = 300 Hz and SD of 6 semitones (half an octave)

priorPlot

if TRUE, produces a separate plot of the prior

nCands

maximum number of pitch candidates per method (except for dom, which returns at most one candidate per frame), normally 1...4

minVoicedCands

minimum number of pitch candidates that have to be defined to consider a frame voiced (defaults to 2 if dom is among other candidates and 1 otherwise)

domThres

(0 to 1) to find the lowest dominant frequency band, we do short-term FFT and take the lowest frequency with amplitude at least domThres

domSmooth

the width of smoothing interval (Hz) for finding dom

autocorThres

(0 to 1) separate voicing thresholds for detecting pitch candidates with three different methods: autocorrelation, cepstrum, and BaNa algorithm (see Details). Note that HNR is calculated even for unvoiced frames.

autocorSmooth

the width of smoothing interval (in bins) for finding peaks in the autocorrelation function. Defaults to 7 for sampling rate 44100 and smaller odd numbers for lower values of sampling rate

cepThres

(0 to 1) separate voicing thresholds for detecting pitch candidates with three different methods: autocorrelation, cepstrum, and BaNa algorithm (see Details). Note that HNR is calculated even for unvoiced frames.

cepSmooth

the width of smoothing interval (in bins) for finding peaks in the cepstrum. Defaults to 31 for sampling rate 44100 and smaller odd numbers for lower values of sampling rate

cepZp

zero-padding of the spectrum used for cepstral pitch detection (final length of spectrum after zero-padding in points, e.g. 2 ^ 13)

specThres

(0 to 1) separate voicing thresholds for detecting pitch candidates with three different methods: autocorrelation, cepstrum, and BaNa algorithm (see Details). Note that HNR is calculated even for unvoiced frames.

specPeak

when looking for putative harmonics in the spectrum, the threshold for peak detection is calculated as specPeak * (1 - HNR * specHNRslope)

specSinglePeakCert

(0 to 1) if F0 is calculated based on a single harmonic ratio (as opposed to several ratios converging on the same candidate), its certainty is taken to be specSinglePeakCert

specHNRslope

when looking for putative harmonics in the spectrum, the threshold for peak detection is calculated as specPeak * (1 - HNR * specHNRslope)

specSmooth

the width of window for detecting peaks in the spectrum, Hz

specMerge

pitch candidates within specMerge semitones are merged with boosted certainty

shortestSyl

the smallest length of a voiced segment (ms) that constitutes a voiced syllable (shorter segments will be replaced by NA, as if unvoiced)

shortestPause

the smallest gap between voiced syllables (ms) that means they shouldn't be merged into one voiced syllable

interpolWin

control the behavior of interpolation algorithm when postprocessing pitch candidates. To turn off interpolation, set interpolWin to NULL. See soundgen:::pathfinder for details.

interpolTol

control the behavior of interpolation algorithm when postprocessing pitch candidates. To turn off interpolation, set interpolWin to NULL. See soundgen:::pathfinder for details.

interpolCert

control the behavior of interpolation algorithm when postprocessing pitch candidates. To turn off interpolation, set interpolWin to NULL. See soundgen:::pathfinder for details.

pathfinding

method of finding the optimal path through pitch candidates: 'none' = best candidate per frame, 'fast' = simple heuristic, 'slow' = annealing. See soundgen:::pathfinder

annealPars

a list of control parameters for postprocessing of pitch contour with SANN algorithm of optim. This is only relevant if pathfinding = 'slow'

certWeight

(0 to 1) in pitch postprocessing, specifies how much we prioritize the certainty of pitch candidates vs. pitch jumps / the internal tension of the resulting pitch curve

snakeStep

optimized path through pitch candidates is further processed to minimize the elastic force acting on pitch contour. To disable, set snakeStep to NULL

snakePlot

if TRUE, plots the snake

smooth

if smooth is a positive number, outliers of the variables in smoothVars are adjusted with median smoothing. smooth of 1 corresponds to a window of ~100 ms and tolerated deviation of ~4 semitones. To disable, set smooth to NULL

smoothVars

if smooth is a positive number, outliers of the variables in smoothVars are adjusted with median smoothing. smooth of 1 corresponds to a window of ~100 ms and tolerated deviation of ~4 semitones. To disable, set smooth to NULL

summary

if TRUE, returns only a summary of the measured acoustic variables (mean, median and SD). If FALSE, returns a list containing frame-by-frame values

plot

if TRUE, produces a spectrogram with pitch contour overlaid

savePath

if a valid path is specified, a plot is saved in this folder (defaults to NA)

specPlot

a list of graphical parameters passed to spectrogram. Set to NULL to suppress plotting just the spectrogram

pitchPlot

a list of graphical parameters for displaying the final pitch contour. Set to NULL or NA to suppress

candPlot

a list of graphical parameters for displaying individual pitch candidates. Set to NULL or NA to suppress

Value

If summary is TRUE, returns a dataframe with one row per audio file. If summary is FALSE, returns a list of detailed descriptives.

Examples

Run this code
# NOT RUN {
# download 260 sounds from Anikin & Persson (2017)
# http://cogsci.se/personal/results/
# 01_anikin-persson_2016_naturalistics-non-linguistic-vocalizations/260sounds_wav.zip
# unzip them into a folder, say '~/Downloads/temp'
myfolder = '~/Downloads/temp'  # 260 .wav files live here
s = analyzeFolder(myfolder, verbose = TRUE)  # ~ 15-30 minutes!

# Check accuracy: import manually verified pitch values (our "key")
key = pitchManual  # a vector of 260 floats
trial = s$pitch_median
cor(key, trial, use = 'pairwise.complete.obs')
plot(log(key), log(trial))
abline(a=0, b=1, col='red')
# }

Run the code above in your browser using DataLab