Learn R Programming

soundgen (version 2.9.0)

ssm: Self-similarity matrix

Description

Calculates the self-similarity matrix and novelty vector of a sound. The self-similarity matrix is produced by cross-correlating different segments of the input sound. Novelty is calculated by convolving the self-similarity matrix with a tapered checkerboard kernel. The positive lobes of the kernel represent coherence (self-similarity within the regions on either side of the center point) and the negative lobes anti-coherence (cross-similarity between these two regions). Since novelty is the dot product of the checkerboard kernel with the SSM, it is high when the two regions are self-similar (internally consistent) but different from each other.

Usage

ssm(
  x,
  samplingRate = NULL,
  from = NULL,
  to = NULL,
  sparse = FALSE,
  input = c("melspec", "mfcc", "spec", "audSpec")[1],
  melfcc_pars = list(windowLength = 125, step = 25, nbands = 50),
  MFCC = 2:13,
  audSpec_pars = list(nFilters = 16, step = 10),
  takeLog = FALSE,
  norm = FALSE,
  simil = c("cosine", "cor")[1],
  kernelLen = 1000,
  kernelSD = 0.5,
  padWith = 0,
  ssmWin = NULL,
  summaryFun = c("mean", "sd"),
  output = c("ssm", "novelty", "summary"),
  reportEvery = NULL,
  cores = 1,
  plot = TRUE,
  savePlots = NULL,
  main = NULL,
  heights = c(2, 1),
  width = 900,
  height = 500,
  units = "px",
  res = NA,
  specPars = list(colorTheme = c("bw", "seewave", "heat.colors", "...")[2], xlab =
    "Time, s"),
  ssmPars = list(colorTheme = c("bw", "seewave", "heat.colors", "...")[2], xlab =
    "Time, s", ylab = "Time, s"),
  noveltyPars = list(type = "b", pch = 16, col = "black", lwd = 3)
)

Value

Returns a list of two components: $ssm contains the self-similarity matrix, and $novelty contains the novelty vector.

Arguments

x

path to a folder, one or more wav or mp3 files c('file1.wav', 'file2.mp3'), Wave object, numeric vector, or a list of Wave objects or numeric vectors

samplingRate

sampling rate of x (only needed if x is a numeric vector)

from, to

if NULL (default), analyzes the whole sound, otherwise from...to (s)

sparse

if TRUE, the entire SSM is not calculated, but only the central region needed to extract the novelty contour (speeds up the processing)

input

the spectral representation used to calculate the SSM: "audSpec" = auditory spectrogram returned by audSpectrogram, "mfcc" = Mel-Frequency Cepstral coefficients, "melspec" = Mel-transformed STFT spectrogram, "spec" = STFT power spectrogram (all three returned by melfcc). Any custom spectrogram-like matrix of features (time in columns labeled in s, features in rows) is also accepted (see examples)

melfcc_pars

a list of parameters passed to melfcc

MFCC

which mel-frequency cepstral coefficients to use; defaults to 2:13

audSpec_pars

a list of parameters passed to audSpectrogram (if input = 'audSpec')

takeLog

if TRUE, the input is log-transformed prior to calculating self-similarity

norm

if TRUE, the spectrum of each STFT frame is normalized

simil

method for comparing frames: "cosine" = cosine similarity, "cor" = Pearson's correlation

kernelLen

length of checkerboard kernel for calculating novelty, ms (larger values favor global, slow vs. local, fast novelty)

kernelSD

SD of checkerboard kernel for calculating novelty

padWith

how to treat edges when calculating novelty: NA = treat sound before and after the recording as unknown, 0 = treat it as silence

ssmWin

window for averaging SSM, frames (has a smoothing effect and speeds up the processing)

summaryFun

functions used to summarize each acoustic characteristic, eg "c('mean', 'sd')"; user-defined functions are fine (see examples); NAs are omitted automatically for mean/median/sd/min/max/range/sum, otherwise take care of NAs yourself

output

what to return (drop "ssm" to save memory when analyzing a lot of files)

reportEvery

when processing multiple inputs, report estimated time left every ... iterations (NULL = default, NA = don't report)

cores

number of cores for parallel processing

plot

if TRUE, plots the SSM

savePlots

full path to the folder in which to save the plots (NULL = don't save, '' = same folder as audio)

main

plot title

heights

relative sizes of the SSM and spectrogram/novelty plot

width, height, units, res

graphical parameters for saving plots passed to png

specPars

graphical parameters passed to filled.contour.mod and affecting the spectrogram

ssmPars

graphical parameters passed to filled.contour.mod and affecting the plot of SSM

noveltyPars

graphical parameters passed to lines and affecting the novelty contour

References

  • Foote, J. (1999, October). Visualizing music and audio using self-similarity. In Proceedings of the seventh ACM international conference on Multimedia (Part 1) (pp. 77-80). ACM.

  • Foote, J. (2000). Automatic audio segmentation using a measure of audio novelty. In Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on (Vol. 1, pp. 452-455). IEEE.

See Also

spectrogram modulationSpectrum segment

Examples

Run this code
sound = c(soundgen(),
          soundgen(nSyl = 4, sylLen = 50, pauseLen = 70,
          formants = NA, pitch = c(500, 330)))
# playme(sound)
# detailed, local features (captures each syllable)
s1 = ssm(sound, samplingRate = 16000, kernelLen = 100,
         sparse = TRUE)  # much faster with 'sparse'
# more global features (captures the transition b/w the two sounds)
s2 = ssm(sound, samplingRate = 16000, kernelLen = 400, sparse = TRUE)

s2$summary
s2$novelty  # novelty contour
if (FALSE) {
ssm(sound, samplingRate = 16000,
    input = 'mfcc', simil = 'cor', norm = TRUE,
    ssmWin = 10,  # speed up the processing
    kernelLen = 300,  # global features
    specPars = list(colorTheme = 'seewave'),
    ssmPars = list(col = rainbow(100)),
    noveltyPars = list(type = 'l', lty = 3, lwd = 2))

# Custom input: produce a nice spectrogram first, then feed it into ssm()
sp = spectrogram(sound, 16000, windowLength = c(5, 40), contrast = .3,
  output = 'processed')  # return the modified spectrogram
colnames(sp) = as.numeric(colnames(sp)) / 1000  # convert ms to s
ssm(sound, 16000, kernelLen = 400, input = sp)

# Custom input: use acoustic features returned by analyze()
an = analyze(sound, 16000, windowLength = 20, novelty = NULL)
input_an = t(an$detailed[, 4:ncol(an$detailed)]) # or select pitch, HNR, ...
input_an = t(apply(input_an, 1, scale))  # z-transform all variables
input_an[is.na(input_an)] = 0  # get rid of NAs
colnames(input_an) = an$detailed$time / 1000  # time stamps in s
rownames(input_an) = 1:nrow(input_an)
image(t(input_an))  # not a spectrogram, just a feature matrix
ssm(sound, 16000, kernelLen = 500, input = input_an, takeLog = FALSE,
  specPars = list(ylab = 'Feature'))
}

Run the code above in your browser using DataLab