segment: Segment a sound

Description

Finds syllables and bursts. Syllables are defined as continuous segments with amplitude above threshold. Bursts are defined as local maxima in amplitude envelope that are high enough both in absolute terms (relative to the global maximum) and with respect to the surrounding region (relative to local mimima). See vignette('acoustic_analysis', package = 'soundgen') for details.

Usage

segment(
  x,
  samplingRate = NULL,
  windowLength = 40,
  overlap = 80,
  shortestSyl = 40,
  shortestPause = 40,
  sylThres = 0.9,
  interburst = NULL,
  interburstMult = 1,
  burstThres = 0.075,
  peakToTrough = 3,
  troughLeft = TRUE,
  troughRight = FALSE,
  summary = NULL,
  summaryFun = NULL,
  plot = FALSE,
  savePath = NA,
  col = "green",
  xlab = "",
  ylab = "Amplitude",
  main = NULL,
  width = 900,
  height = 500,
  units = "px",
  res = NA,
  sylPlot = list(lty = 1, lwd = 2, col = "blue"),
  burstPlot = list(pch = 8, cex = 3, col = "red"),
  ...
)

Arguments

path to a .wav or .mp3 file or a vector of amplitudes with specified samplingRate

samplingRate

sampling rate of x (only needed if x is a numeric vector, rather than an audio file)

windowLength, overlap

length (ms) and overlap ( window used to produce the amplitude envelope, see env

shortestSyl

minimum acceptable length of syllables, ms

shortestPause

minimum acceptable break between syllables, ms. Syllables separated by less time are merged. To avoid merging, specify shortestPause = NA

sylThres

amplitude threshold for syllable detection (as a proportion of global mean amplitude of smoothed envelope)

interburst

minimum time between two consecutive bursts (ms). If specified, it overrides interburstMult

interburstMult

multiplier of the default minimum interburst interval (median syllable length or, if no syllables are detected, the same number as shortestSyl). Only used if interburst is not specified. Larger values improve detection of unusually broad shallow peaks, while smaller values improve the detection of sharp narrow peaks

burstThres

to qualify as a burst, a local maximum has to be at least burstThres times the height of the global maximum of amplitude envelope

peakToTrough

to qualify as a burst, a local maximum has to be at least peakToTrough times the local minimum on the LEFT over analysis window (which is controlled by interburst or interburstMult)

troughLeft, troughRight

should local maxima be compared to the trough on the left and/or right of it? Default to TRUE and FALSE, respectively

summary

if TRUE, returns only a summary of the number and spacing of syllables and vocal bursts. If FALSE, returns a list containing full stats on each syllable and bursts (location, duration, amplitude, ...)

summaryFun

functions used to summarize each acoustic characteristic, eg "c('mean', 'sd')"; user-defined functions are fine (see examples); NAs are omitted automatically for mean/median/sd/min/max/range/sum, otherwise take care of NAs yourself; if summaryFun = NULL, analyze() returns a list containing frame-by-frame values

plot

if TRUE, produces a segmentation plot

savePath

full path to the folder in which to save the plot. Defaults to NA

col, xlab, ylab, main

main plotting parameters

width, height, units, res

parameters passed to png if the plot is saved

sylPlot

a list of graphical parameters for displaying the syllables

burstPlot

a list of graphical parameters for displaying the bursts

...

other graphical parameters passed to graphics::plot

Value

If summary = TRUE, returns only a summary of the number and spacing of syllables and vocal bursts. If summary = FALSE, returns a list containing full stats on each syllable and bursts (location, duration, amplitude, ...).

Details

The algorithm is very flexible, but the parameters may be hard to optimize by hand. If you have an annotated sample of the sort of audio you are planning to analyze, with syllables and/or bursts counted manually, you can use it for automatic optimization of control parameters (see optimizePars. The defaults are the results of just such optimization against 260 human vocalizations in Anikin, A. & Persson, T. (2017). Non-linguistic vocalizations from online amateur videos for emotion research: a validated corpus. Behavior Research Methods, 49(2): 758-771.

Examples

Run this code

# NOT RUN {
sound = soundgen(nSyl = 4, sylLen = 50, pauseLen = 70,
  pitch = c(368, 284), temperature = 0.1,
  noise = list(time = c(0, 67, 86, 186), value = c(-45, -47, -89, -120)),
  rolloff_noise = -8, amplGlobal = c(0, -20),
  dynamicRange = 120)
spectrogram(sound, samplingRate = 16000, osc = TRUE)
 # playme(sound, samplingRate = 16000)

s = segment(sound, samplingRate = 16000, plot = TRUE)
# accept quicker and quieter syllables
s = segment(sound, samplingRate = 16000, plot = TRUE,
  shortestSyl = 25, shortestPause = 25, sylThres = .2, burstThres = .05)

# just a summary (see examples in ?analyze for custom summaryFun)
segment(sound, samplingRate = 16000, summaryFun = c('mean', 'sd'))
# Note that syllables are slightly longer and pauses shorter than they should
# be (b/c of the smoothing of amplitude envelope), while interburst intervals
# are right on target (~120 ms)

# customizing the plot
s = segment(sound, samplingRate = 16000, plot = TRUE,
            shortestSyl = 25, shortestPause = 25,
            sylThres = .2, burstThres = .05,
            col = 'black', lwd = .5,
            sylPlot = list(lty = 2, col = 'gray20'),
            burstPlot = list(pch = 16, col = 'gray80'),
            xlab = 'Some custom label', cex.lab = 1.2, main = 'My awesome plot')

# }
# NOT RUN {
# customize the resolution of saved plot
s = segment(sound, samplingRate = 16000, savePath = '~/Downloads/',
            width = 1920, height = 1080, units = 'px')
# }

Run the code above in your browser using DataLab

Get 50% off unlimited learning