soundgen (version 1.5.0)

ssm: Self-similarity matrix

Description

Calculates the self-similarity matrix and novelty vector of a sound.

Usage

ssm(x, samplingRate = NULL, windowLength = 40, overlap = 75,
  step = NULL, ssmWin = 40, maxFreq = NULL, nBands = NULL,
  MFCC = 2:13, input = c("mfcc", "audiogram", "spectrum")[1],
  norm = FALSE, simil = c("cosine", "cor")[1], returnSSM = TRUE,
  kernelLen = 200, kernelSD = 0.2, padWith = 0, plot = TRUE,
  heights = c(2, 1), specPars = list(levels = seq(0, 1, length = 30),
  color.palette = seewave::spectro.colors, xlab = "Time, s", ylab = "kHz",
  ylim = c(0, maxFreq/1000)), ssmPars = list(levels = seq(0, 1, length =
  30), color.palette = seewave::spectro.colors, xlab = "Time, s", ylab =
  "Time, s", main = "Self-similarity matrix"), noveltyPars = list(type =
  "b", pch = 16, col = "black", lwd = 3))

Arguments

x

path to a .wav file or a vector of amplitudes with specified samplingRate

samplingRate

sampling rate of x (only needed if x is a numeric vector, rather than a .wav file)

windowLength

length of FFT window, ms

overlap

overlap between successive FFT frames, %

step

you can override overlap by specifying FFT step, ms

ssmWin

window for averaging SSM, ms

maxFreq

highest band edge of mel filters, Hz. Defaults to samplingRate / 2. See melfcc

nBands

number of warped spectral bands to use. Defaults to 100 * windowLength / 20. See melfcc

MFCC

which mel-frequency cepstral coefficients to use; defaults to 2:13

input

either MFCCs ("cepstrum") or mel-filtered spectrum ("audiogram")

norm

if TRUE, the spectrum of each STFT frame is normalized

simil

method for comparing frames: "cosine" = cosine similarity, "cor" = Pearson's correlation

returnSSM

if TRUE, returns the SSM

kernelLen

length of checkerboard kernel for calculating novelty, ms (larger values favor global vs. local novelty)

kernelSD

SD of checkerboard kernel for calculating novelty

padWith

how to treat edges when calculating novelty: NA = treat sound before and after the recording as unknown, 0 = treat it as silence

plot

if TRUE, plots the SSM

heights

relative sizes of the SSM and spectrogram/novelty plot

specPars

graphical parameters passed to seewave::filled.contour.modif2 and affecting the spectrogram

ssmPars

graphical parameters passed to seewave::filled.contour.modif2 and affecting the plot of SSM

noveltyPars

graphical parameters passed to lines and affecting the novelty contour

Value

If returnSSM is TRUE, returns a list of two components: $ssm contains the self-similarity matrix, and $novelty contains the novelty vector. If returnSSM is FALSE, only produces a plot.

References

  • El Badawy, D., Marmaroli, P., & Lissek, H. (2013). Audio Novelty-Based Segmentation of Music Concerts. In Acoustics 2013 (No. EPFL-CONF-190844)

  • Foote, J. (1999, October). Visualizing music and audio using self-similarity. In Proceedings of the seventh ACM international conference on Multimedia (Part 1) (pp. 77-80). ACM.

  • Foote, J. (2000). Automatic audio segmentation using a measure of audio novelty. In Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on (Vol. 1, pp. 452-455). IEEE.

Examples

Run this code
# NOT RUN {
sound = c(soundgen(), soundgen(nSyl = 4, sylLen = 50, pauseLen = 70,
          formants = NA, pitch = c(500, 330)))
# playme(sound)
m1 = ssm(sound, samplingRate = 16000,
         input = 'audiogram', simil = 'cor', norm = FALSE,
         ssmWin = 10, kernelLen = 150)  # detailed, local features
# }
# NOT RUN {
m2 = ssm(sound, samplingRate = 16000,
         input = 'mfcc', simil = 'cosine', norm = TRUE,
         ssmWin = 50, kernelLen = 600)  # more global
# plot(m2$novelty, type='b')  # use for peak detection, etc
# }

Run the code above in your browser using DataLab