soundgen (version 1.5.0)

analyze: Analyze sound


Acoustic analysis of a single sound file: pitch tracking, basic spectral characteristics, and estimated loudness (see getLoudness). The default values of arguments are optimized for human non-linguistic vocalizations. See vignette('acoustic_analysis', package = 'soundgen') for details.


analyze(x, samplingRate = NULL, dynamicRange = 80, silence = 0.04,
  scale = NULL, SPL_measured = 70, Pref = 2e-05, windowLength = 50,
  step = NULL, overlap = 50, wn = "gaussian", zp = 0,
  cutFreq = 6000, nFormants = 3, pitchMethods = c("autocor", "spec",
  "dom"), entropyThres = 0.6, pitchFloor = 75, pitchCeiling = 3500,
  priorMean = 300, priorSD = 6, priorPlot = FALSE, nCands = 1,
  minVoicedCands = NULL, domThres = 0.1, domSmooth = 220,
  autocorThres = 0.7, autocorSmooth = NULL, cepThres = 0.3,
  cepSmooth = 400, cepZp = 0, specThres = 0.3, specPeak = 0.35,
  specSinglePeakCert = 0.4, specHNRslope = 0.8, specSmooth = 150,
  specMerge = 1, shortestSyl = 20, shortestPause = 60,
  interpolWin = 75, interpolTol = 0.3, interpolCert = 0.3,
  pathfinding = c("none", "fast", "slow")[2], annealPars = list(maxit =
  5000, temp = 1000), certWeight = 0.5, snakeStep = 0.05,
  snakePlot = FALSE, smooth = 1, smoothVars = c("pitch", "dom"),
  summary = FALSE, summaryFun = c("mean", "median", "sd"),
  plot = TRUE, showLegend = TRUE, savePath = NA, plotSpec = TRUE,
  pitchPlot = list(col = rgb(0, 0, 1, 0.75), lwd = 3),
  candPlot = list(), ylim = NULL, xlab = "Time, ms", ylab = "kHz",
  main = NULL, width = 900, height = 500, units = "px", res = NA,



path to a .wav or .mp3 file or a vector of amplitudes with specified samplingRate


sampling rate of x (only needed if x is a numeric vector, rather than an audio file)


dynamic range, dB. All values more than one dynamicRange under maximum are treated as zero


(0 to 1) frames with RMS amplitude below silence threshold are not analyzed at all. NB: this number is dynamically updated: the actual silence threshold may be higher depending on the quietest frame, but it will never be lower than this specified number.


maximum possible amplitude of input used for normalization of input vector (not needed if input is an audio file)


sound pressure level at which the sound is presented, dB (set to 0 to skip analyzing subjective loudness)


reference pressure, Pa


length of FFT window, ms


you can override overlap by specifying FFT step, ms


overlap between successive FFT frames, %


window type: gaussian, hanning, hamming, bartlett, rectangular, blackman, flattop


window length after zero padding, points


(>0 to Nyquist, Hz) repeat the calculation of spectral descriptives after discarding all info above cutFreq. Recommended if the original sampling rate varies across different analyzed audio files


the number of formants to extract per STFT frame (0 = no formant analysis). Calls findformants with default settings


methods of pitch estimation to consider for determining pitch contour: 'autocor' = autocorrelation (~PRAAT), 'cep' = cepstral, 'spec' = spectral (~BaNa), 'dom' = lowest dominant frequency band ('' or NULL = no pitch analysis)


pitch tracking is not performed for frames with Weiner entropy above entropyThres, but other spectral descriptives are still calculated

pitchFloor, pitchCeiling

absolute bounds for pitch candidates (Hz)

priorMean, priorSD

specifies the mean (Hz) and standard deviation (semitones) of gamma distribution describing our prior knowledge about the most likely pitch values for this file. For ex., priorMean = 300, priorSD = 6 gives a prior with mean = 300 Hz and SD = 6 semitones (half an octave)


if TRUE, produces a separate plot of the prior


maximum number of pitch candidates per method (except for dom, which returns at most one candidate per frame), normally 1...4


minimum number of pitch candidates that have to be defined to consider a frame voiced (if NULL, defaults to 2 if dom is among other candidates and 1 otherwise)


(0 to 1) to find the lowest dominant frequency band, we do short-term FFT and take the lowest frequency with amplitude at least domThres


the width of smoothing interval (Hz) for finding dom

autocorThres, cepThres, specThres

(0 to 1) separate voicing thresholds for detecting pitch candidates with three different methods: autocorrelation, cepstrum, and BaNa algorithm (see Details). Note that HNR is calculated even for unvoiced frames.


the width of smoothing interval (in bins) for finding peaks in the autocorrelation function. Defaults to 7 for sampling rate 44100 and smaller odd numbers for lower values of sampling rate


the width of smoothing interval (Hz) for finding peaks in the cepstrum


zero-padding of the spectrum used for cepstral pitch detection (final length of spectrum after zero-padding in points, e.g. 2 ^ 13)

specPeak, specHNRslope

when looking for putative harmonics in the spectrum, the threshold for peak detection is calculated as specPeak * (1 - HNR * specHNRslope)


(0 to 1) if F0 is calculated based on a single harmonic ratio (as opposed to several ratios converging on the same candidate), its certainty is taken to be specSinglePeakCert


the width of window for detecting peaks in the spectrum, Hz


pitch candidates within specMerge semitones are merged with boosted certainty


the smallest length of a voiced segment (ms) that constitutes a voiced syllable (shorter segments will be replaced by NA, as if unvoiced)


the smallest gap between voiced syllables (ms) that means they shouldn't be merged into one voiced syllable

interpolWin, interpolTol, interpolCert

control the behavior of interpolation algorithm when postprocessing pitch candidates. To turn off interpolation, set interpolWin = 0. See soundgen:::pathfinder for details.


method of finding the optimal path through pitch candidates: 'none' = best candidate per frame, 'fast' = simple heuristic, 'slow' = annealing. See soundgen:::pathfinder


a list of control parameters for postprocessing of pitch contour with SANN algorithm of optim. This is only relevant if pathfinding = 'slow'


(0 to 1) in pitch postprocessing, specifies how much we prioritize the certainty of pitch candidates vs. pitch jumps / the internal tension of the resulting pitch curve


optimized path through pitch candidates is further processed to minimize the elastic force acting on pitch contour. To disable, set snakeStep = 0


if TRUE, plots the snake

smooth, smoothVars

if smooth is a positive number, outliers of the variables in smoothVars are adjusted with median smoothing. smooth of 1 corresponds to a window of ~100 ms and tolerated deviation of ~4 semitones. To disable, set smooth = 0


if TRUE, returns only a summary of the measured acoustic variables (mean, median and SD). If FALSE, returns a list containing frame-by-frame values


a vector of names of functions used to summarize each acoustic characteristic


if TRUE, produces a spectrogram with pitch contour overlaid


if TRUE, adds a legend with pitch tracking methods


if a valid path is specified, a plot is saved in this folder (defaults to NA)


if FALSE, the spectrogram will not be plotted


a list of graphical parameters for displaying the final pitch contour. Set to NULL or NA to suppress


a list of graphical parameters for displaying individual pitch candidates. Set to NULL or NA to suppress


frequency range to plot, kHz (defaults to 0 to Nyquist frequency)

xlab, ylab, main

plotting parameters

width, height, units, res

parameters passed to png if the plot is saved


other graphical parameters passed to spectrogram


If summary = TRUE, returns a dataframe with one row and three columns per acoustic variable (mean / median / SD). If summary = FALSE, returns a dataframe with one row per STFT frame and one column per acoustic variable. The best guess at the pitch contour considering all available information is stored in the variable called "pitch". In addition, the output contains pitch estimates by separate algorithms included in pitchMethods and a number of other acoustic descriptors:


total duration, s


duration from the beginning of the first non-silent STFT frame to the end of the last non-silent STFT frame, s (NB: depends strongly on windowLength and silence settings)


time of the middle of each frame (ms)


root mean square of amplitude per frame, calculated as sqrt(mean(frame ^ 2))


the same as ampl for voiced frames and NA for unvoiced frames


lowest dominant frequency band (Hz) (see "Pitch tracking methods / Dominant frequency" in the vignette)


Weiner entropy of the spectrum of the current frame. Close to 0: pure tone or tonal sound with nearly all energy in harmonics; close to 1: white noise

f1_freq, f1_width, ...

the frequency and bandwidth of the first nFormants formants per STFT frame, as calculated by phonTools::findformants with default settings


the amount of energy in upper harmonics, namely the ratio of total spectral mass above 1.25 x F0 to the total spectral mass below 1.25 x F0 (dB)


harmonics-to-noise ratio (dB), a measure of harmonicity returned by soundgen:::getPitchAutocor (see "Pitch tracking methods / Autocorrelation"). If HNR = 0 dB, there is as much energy in harmonics as in noise


subjective loudness, in sone, corresponding to the chosen SPL_measured - see getLoudness


50th quantile of the frame's spectrum


the frequency with maximum spectral power (Hz)


the frequency with maximum spectral power below cutFreq (Hz)


post-processed pitch contour based on all F0 estimates


autocorrelation estimate of F0


cepstral estimate of F0


BaNa estimate of F0

quartile25, quartile50, quartile75

the 25th, 50th, and 75th quantiles of the spectrum below cutFreq (Hz)


the center of gravity of the frame<U+2019>s spectrum, first spectral moment (Hz)


the center of gravity of the frame<U+2019>s spectrum below cutFreq


the slope of linear regression fit to the spectrum below cutFreq


is the current STFT frame voiced? TRUE / FALSE


Run this code
sound = soundgen(sylLen = 300, pitch = c(900, 400, 2300),
  noise = list(time = c(0, 300), value = c(-40, 0)),
  temperature = 0.001, addSilence = 0)
# playme(sound, 16000)
a = analyze(sound, samplingRate = 16000, plot = TRUE)

# }
# For maximum processing speed (just basic spectral descriptives):
a = analyze(sound, samplingRate = 16000,
  plot = FALSE,         # no plotting
  pitchMethods = NULL,  # no pitch tracking
  SPL_measured = NULL,  # no loudness analysis
  nFormants = 0         # no formant analysis

sound1 = soundgen(sylLen = 900, pitch = list(
  time = c(0, .3, .9, 1), value = c(300, 900, 400, 2300)),
  noise = list(time = c(0, 300), value = c(-40, 0)),
  temperature = 0.001, addSilence = 0)
# improve the quality of postprocessing:
a1 = analyze(sound1, samplingRate = 16000, plot = TRUE, pathfinding = 'slow')
median(a1$pitch, na.rm = TRUE)
# (can vary, since postprocessing is stochastic)
# compare to the true value:
median(getSmoothContour(anchors = list(time = c(0, .3, .8, 1),
  value = c(300, 900, 400, 2300)), len = 1000))

# the same pitch contour, but harder b/c of subharmonics and jitter
sound2 = soundgen(sylLen = 900, pitch = list(
  time = c(0, .3, .8, 1), value = c(300, 900, 400, 2300)),
  noise = list(time = c(0, 900), value = c(-40, 0)),
  subDep = 100, jitterDep = 0.5, nonlinBalance = 100, temperature = 0.001)
# playme(sound2, 16000)
a2 = analyze(sound2, samplingRate = 16000, plot = TRUE, pathfinding = 'slow')
# many candidates are off, but the overall contour should be mostly accurate

# Fancy plotting options:
a = analyze(sound2, samplingRate = 16000, plot = TRUE,
  xlab = 'Time, ms', colorTheme = 'seewave',
  contrast = .5, ylim = c(0, 4),
  pitchMethods = c('dom', 'autocor', 'spec'),
  candPlot = list(
    col = c('gray70', 'yellow', 'purple'),  # same order as pitchMethods
    pch = c(1, 3, 5),
    cex = 3),
  pitchPlot = list(col = 'black', lty = 3, lwd = 3))

# Plot pitch candidates w/o a spectrogram
a = analyze(sound2, samplingRate = 16000, plot = TRUE, plotSpec = FALSE)

# Different formatting options for output
a = analyze(sound2, samplingRate = 16000, summary = FALSE)  # frame-by-frame
a = analyze(sound2, samplingRate = 16000, summary = TRUE,
            summaryFun = c('mean', 'range'))  # one row per sound
# ...with custom summaryFun
difRan = function(x) diff(range(x))
a = analyze(sound2, samplingRate = 16000, summary = TRUE,
            summaryFun = c('mean', 'difRan'))

# Save the plot
a = analyze(sound, samplingRate = 16000,
            savePath = '~/Downloads/',
            width = 20, height = 15, units = 'cm', res = 300)

## Amplitude and loudness: analyze() should give the same results as
dedicated functions getRMS() / getLoudness()
# Create 1 kHz tone
samplingRate = 16000; dur_ms = 50
sound1 = sin(2*pi*1000/samplingRate*(1:(dur_ms/1000*samplingRate)))
a1 = analyze(sound1, samplingRate = samplingRate, windowLength = 25,
        overlap = 50, SPL_measured = 40, scale = 1,
        pitchMethods = NULL, plot = FALSE)
a1$loudness  # loudness per STFT frame (1 sone by definition)
getLoudness(sound1, samplingRate = samplingRate, windowLength = 25,
            overlap = 50, SPL_measured = 40, scale = 1)$loudness
a1$ampl  # RMS amplitude per STFT frame
getRMS(sound1, samplingRate = samplingRate, windowLength = 25,
       overlap = 50, scale = 1)
# or even simply: sqrt(mean(sound1 ^ 2))

# The same sound as above, but with half the amplitude
a_half = analyze(sound1/2, samplingRate = samplingRate, windowLength = 25,
        overlap = 50, SPL_measured = 40, scale = 1,
        pitchMethods = NULL, plot = FALSE)
a1$ampl / a_half$ampl  # rms amplitude halved
a1$loudness/ a_half$loudness  # loudness is not a linear function of amplitude

# Amplitude & loudness of an existing audio file
sound2 = '~/Downloads/temp/032_ut_anger_30-m-roar-curse.wav'
a2 = analyze(sound2, windowLength = 25, overlap = 50, SPL_measured = 40,
        pitchMethods = NULL, plot = FALSE)
apply(a2[, c('loudness', 'ampl')], 2, median, na.rm = TRUE)
median(getLoudness(sound2, windowLength = 25, overlap = 50,
                   SPL_measured = 40)$loudness)
median(getRMS(sound2, windowLength = 25, overlap = 50, scale = 1))
# }

