`y_axis='mel'` argument in `specshow` results in incorrectly labeled units as well as incorrect y-axis ticks.
See original GitHub issueHello,
I think I’ve found a bug, and if this is a mess up on my part, I apologize in advance.
Describe the bug
It appears that the mel-scale spectrogram y-axis is displayed incorrectly when using librosa.display.specshow()
with y_axis='mel'
keyword argument.
The expected behavior is as follows: If an original spectrogram D
has frequency values ranging from 0 to ~5000, then the accompanying mel-spectrogram that is obtained by librosa.feature.melspectrogram(S=D, sr=sr)
should have mel values ranging from 20 to ~2500. Using the y_axis='mel'
argument should result in a y-axis that is on the ‘mel’ scale. The y-axis ticks can be scaled in any manner (log base 10 or log base 2 maybe), but most importantly the value range should be between 0 and roughly 2500.
Actual behavior is as follows: The data is spread over a labeled range from 0 to ~10000 units. The y-axis appears to be in log base 2 scale, which is fine. However, the labeled units are Hz rather than mels, which is incorrect.
Summary: The range of values (y-axis ticks) seems incorrect, and the label “Hz” should be converted to “Mels”.
To Reproduce:
The following code will produce side by side of a standard spectrogram produced using stft()
and a mel spectrogram which is produced using librosa.feature.melspectrogram()
applied to the original stft()
import librosa as li
import librosa.display
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
# load sample audio
file = li.ex('trumpet')
aud, sr = li.load(file, sr=None)
n_ftt = 512
rsr = 11025
# apply low pass filter before downsampling. Attenuate at resample rate divided by 2.
cutoff = rsr / 2
sos = sig.butter(10, cutoff, fs=sr, btype='lowpass', analog=False, output='sos')
aud = sig.sosfilt(sos, aud)
# downsample and update sample rate value
aud, sr = li.resample(aud, sr, rsr), rsr
# create both standard and mel spectrograms:
spec = li.stft(aud, n_fft=512, window=sig.windows.hamming)
spec = np.abs(spec)
mel_spec = li.feature.melspectrogram(S=spec, sr=11025)
# plot standard spectrogram and mel-scale spectrogram side by side:
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
specs = (li.amplitude_to_db(sp) for sp in (spec, mel_spec))
scales = ('hz', 'mel')
for i, (sp, sc) in enumerate(zip(specs, scales)):
li.display.specshow(sp, x_axis='time', y_axis=sc, sr=sr, ax=axes[i])
Screenshots This is the result of the “To Reproduce” code above:
Software versions*
Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)]
NumPy 1.19.1
SciPy 1.5.2
librosa 0.8.0
python: 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)]
librosa: 0.8.0
audioread: 2.1.8
numpy: 1.19.1
scipy: 1.5.2
sklearn: 0.23.2
joblib: 0.16.0
decorator: 4.4.2
soundfile: 0.10.3
resampy: 0.2.2
numba: 0.50.1
numpydoc: None
sphinx: 3.1.1
sphinx_rtd_theme: None
sphinxcontrib.versioning: None
sphinx-gallery: None
pytest: 6.0.1
pytest-mpl: None
pytest-cov: None
matplotlib: 3.3.0
presets: None
Additional context That’s it! Perhaps I’m missing something conceptually or there is an error in my code, but I think I’ve got everything right. If I’ve messed something up, I apologize for reporting this as a bug!
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (5 by maintainers)
Top GitHub Comments
It feels a little bloaty, but I’d be okay with it. It would just be a pass-through param at this line: https://github.com/librosa/librosa/blob/7a5efc5f8b921db6ee8079b411629198c2154a18/librosa/display.py#L1266
@wcneill maybe it would help to understand how this is all implemented under the hood.
When you say
, specshow will calculate the center frequency (in Hz) of each mel band, based on the parameters (sampling rate, nfft, etc). It will then render the spectrogram in natural (y-axis) coordinates, which in this case means Hz. Finally, based on how the Slaney (default) filters are defined, it uses a linear axis warping below 1KHz, and a logarithmic warping above 1KHz (asymlog
axis, in matplotlib speak) to stretch the rendered figure. The reason we do it this way is so that the y-axis is always in Hz, and you can interact with the plot accordingly: eg by plotting f0 curves (also measured in hz), or highlighting frequency bands of interest, etc. Similar things are done for linear stft, log-frequency stft, and cqt: the coordinates will all be transformed so that the resulting y-axis is in Hz.