Transforms a spectrogram into a time series with inverse STFT. The problem is that an ordinary spectrogram preserves only the magnitude (modulus) of the complex STFT, while the phase is lost, and without phase it is impossible to reconstruct the original audio accurately. So there are a number of algorithms for "guessing" the phase that would produce an audio whose magnitude spectrogram is very similar to the target spectrogram. Useful for certain filtering operations that modify the magnitude spectrogram followed by inverse STFT, such as filtering in the spectrotemporal modulation domain.
invertSpectrogram(
spec,
samplingRate,
windowLength,
overlap,
step = NULL,
wn = "hanning",
specType = c("abs", "log", "dB")[1],
initialPhase = c("zero", "random", "spsi")[3],
nIter = 50,
normalize = TRUE,
play = TRUE,
verbose = FALSE,
plotError = TRUE
)
the spectrogram that is to be transform to a time series: numeric matrix with frequency bins in rows and time frames in columns
sampling rate of x
(only needed if x
is a
numeric vector)
length of FFT window, ms
overlap between successive FFT frames, %
you can override overlap
by specifying FFT step, ms (NB:
because digital audio is sampled at discrete time intervals of
1/samplingRate, the actual step and thus the time stamps of STFT frames
may be slightly different, eg 24.98866 instead of 25.0 ms)
window type accepted by ftwindow
, currently
gaussian, hanning, hamming, bartlett, rectangular, blackman, flattop
the scale of target spectroram: 'abs' = absolute, 'log' = log-transformed, 'dB' = in decibels
initial phase estimate: "zero" = set all phases to zero; "random" = Gaussian noise; "spsi" (default) = single-pass spectrogram inversion (Beauregard et al., 2015)
the number of iterations of the GL algorithm (Griffin & Lim, 1984), 0 = don't run
if TRUE, normalizes the output to range from -1 to +1
if TRUE, plays back the reconstructed audio
if TRUE, prints estimated time left every 10% of GL iterations
if TRUE, produces a scree plot of squared error over GL iterations (useful for choosing `nIter`)
Returns the reconstructed audio as a numeric vector.
Algorithm: takes the spectrogram, makes an initial guess at the phase (zero, noise, or a more intelligent estimate by the SPSI algorithm), fine-tunes over `nIter` iterations with the GL algorithm, reconstructs the complex spectrogram using the best phase estimate, and performs inverse STFT. The single-pass spectrogram inversion (SPSI) algorithm is implemented as described in Beauregard et al. (2015) following the python code at https://github.com/lonce/SPSI_Python. The Griffin-Lim (GL) algorithm is based on Griffin & Lim (1984).
Griffin, D., & Lim, J. (1984). Signal estimation from modified short-time Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2), 236-243.
Beauregard, G. T., Harish, M., & Wyse, L. (2015, July). Single pass spectrogram inversion. In 2015 IEEE International Conference on Digital Signal Processing (DSP) (pp. 427-431). IEEE.
# NOT RUN {
# Create a spectrogram
samplingRate = 16000
windowLength = 40
overlap = 75
wn = 'hanning'
s = soundgen(samplingRate = samplingRate, addSilence = 100)
spec = spectrogram(s, samplingRate = samplingRate,
wn = wn, windowLength = windowLength, overlap = overlap,
padWithSilence = FALSE, output = 'original')
# Invert the spectrogram, attempting to guess the phase
# Note that samplingRate, wn, windowLength, and overlap must be the same as
# in the original (ie you have to know how the spectrogram was created)
s_new = invertSpectrogram(spec, samplingRate = samplingRate,
windowLength = windowLength, overlap = overlap, wn = wn,
initialPhase = 'spsi', nIter = 10, specType = 'abs', play = FALSE)
# }
# NOT RUN {
# Verify the quality of audio reconstruction
# playme(s, samplingRate); playme(s_new, samplingRate)
spectrogram(s, samplingRate, osc = TRUE)
spectrogram(s_new, samplingRate, osc = TRUE)
# }
Run the code above in your browser using DataLab