`y_axis='mel'` argument in `specshow` results in incorrectly labeled units as well as incorrect y-axis ticks.
See original GitHub issueHello,
I think I’ve found a bug, and if this is a mess up on my part, I apologize in advance.
Describe the bug
It appears that the mel-scale spectrogram y-axis is displayed incorrectly when using librosa.display.specshow()
with y_axis='mel'
keyword argument.
The expected behavior is as follows: If an original spectrogram D
has frequency values ranging from 0 to ~5000, then the accompanying mel-spectrogram that is obtained by librosa.feature.melspectrogram(S=D, sr=sr)
should have mel values ranging from 20 to ~2500. Using the y_axis='mel'
argument should result in a y-axis that is on the ‘mel’ scale. The y-axis ticks can be scaled in any manner (log base 10 or log base 2 maybe), but most importantly the value range should be between 0 and roughly 2500.
Actual behavior is as follows: The data is spread over a labeled range from 0 to ~10000 units. The y-axis appears to be in log base 2 scale, which is fine. However, the labeled units are Hz rather than mels, which is incorrect.
Summary: The range of values (y-axis ticks) seems incorrect, and the label “Hz” should be converted to “Mels”.
To Reproduce:
The following code will produce side by side of a standard spectrogram produced using stft()
and a mel spectrogram which is produced using librosa.feature.melspectrogram()
applied to the original stft()
output.
import librosa as li
import librosa.display
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
# load sample audio
file = li.ex('trumpet')
aud, sr = li.load(file, sr=None)
n_ftt = 512
rsr = 11025
# apply low pass filter before downsampling. Attenuate at resample rate divided by 2.
cutoff = rsr / 2
sos = sig.butter(10, cutoff, fs=sr, btype='lowpass', analog=False, output='sos')
aud = sig.sosfilt(sos, aud)
# downsample and update sample rate value
aud, sr = li.resample(aud, sr, rsr), rsr
# create both standard and mel spectrograms:
spec = li.stft(aud, n_fft=512, window=sig.windows.hamming)
spec = np.abs(spec)
mel_spec = li.feature.melspectrogram(S=spec, sr=11025)
# plot standard spectrogram and mel-scale spectrogram side by side:
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
specs = (li.amplitude_to_db(sp) for sp in (spec, mel_spec))
scales = ('hz', 'mel')
for i, (sp, sc) in enumerate(zip(specs, scales)):
li.display.specshow(sp, x_axis='time', y_axis=sc, sr=sr, ax=axes[i])
plt.tight_layout()
plt.show()
Screenshots This is the result of the “To Reproduce” code above:
Software versions*
Windows-10-10.0.14393-SP0
Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)]
NumPy 1.19.1
SciPy 1.5.2
librosa 0.8.0
INSTALLED VERSIONS
------------------
python: 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)]
librosa: 0.8.0
audioread: 2.1.8
numpy: 1.19.1
scipy: 1.5.2
sklearn: 0.23.2
joblib: 0.16.0
decorator: 4.4.2
soundfile: 0.10.3
resampy: 0.2.2
numba: 0.50.1
numpydoc: None
sphinx: 3.1.1
sphinx_rtd_theme: None
sphinxcontrib.versioning: None
sphinx-gallery: None
pytest: 6.0.1
pytest-mpl: None
pytest-cov: None
matplotlib: 3.3.0
presets: None
Additional context That’s it! Perhaps I’m missing something conceptually or there is an error in my code, but I think I’ve got everything right. If I’ve messed something up, I apologize for reporting this as a bug!
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (5 by maintainers)
Top GitHub Comments
It feels a little bloaty, but I’d be okay with it. It would just be a pass-through param at this line: https://github.com/librosa/librosa/blob/7a5efc5f8b921db6ee8079b411629198c2154a18/librosa/display.py#L1266
@wcneill maybe it would help to understand how this is all implemented under the hood.
When you say
y_axis='mel'
, specshow will calculate the center frequency (in Hz) of each mel band, based on the parameters (sampling rate, nfft, etc). It will then render the spectrogram in natural (y-axis) coordinates, which in this case means Hz. Finally, based on how the Slaney (default) filters are defined, it uses a linear axis warping below 1KHz, and a logarithmic warping above 1KHz (asymlog
axis, in matplotlib speak) to stretch the rendered figure. The reason we do it this way is so that the y-axis is always in Hz, and you can interact with the plot accordingly: eg by plotting f0 curves (also measured in hz), or highlighting frequency bands of interest, etc. Similar things are done for linear stft, log-frequency stft, and cqt: the coordinates will all be transformed so that the resulting y-axis is in Hz.