Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MelSpectrogram inconsistency with librosa melspectrogram

See original GitHub issue

Hello! I am excited with this framework a lot and its ability to make transformations on gpu.

Problem: transforms.Spectrogram (with power 1.) (which is real) output equals to absolute value of librosa.stft (which is complex) with equal parameters.

Here is spectrograms for my example audio (really close results): Screenshot 2020-11-26 at 22 23 12 Screenshot 2020-11-26 at 22 24 45

Next step is to get melspectrogram using transforms.MelScale (on Spectrogram with power 1) and librosa.feature.melspectrogram (actually power is 1., this argument not in use) (using previous spectrogram). And here we can’t get the same result:

in both steps only matmul takes place
in transforms.MelScale tensors with real values multiplicated, in librosa.feature.melspectrogram gives us multiplication of complex based matrices, thus in the result we can get absolutely different values
also quite misleading use of power in transforms.Spectrogram (don’t need in librosa.stft)

And the result (differs not only in some fields, but in scale too): Screenshot 2020-11-26 at 22 45 46 Screenshot 2020-11-26 at 22 47 18

Issue Analytics

State:
Created 3 years ago
Comments:17 (9 by maintainers)

Top GitHub Comments

12reactions

mthrokcommented, Feb 12, 2021

Okay, I did further research and could reproduce librosa’s melspectrogram with torchaudio. The parameters added in #1212 helped.

Numerical compatibility

Spectrogram Confirmed that torchaudio.transforms.Spectrogram and librosa.core.spectrum._spectrogram can produce numerically comparable results. [script]

torchaudio_spec = torchaudio.transforms.Spectrogram(
    n_fft=n_fft,
    win_length=win_len,
    hop_length=hop_len,
    center=True,
    pad_mode="reflect",
    power=2.0,
)(waveform)
librosa_spec, _ = librosa.core.spectrum._spectrogram(
    waveform.numpy(),
    n_fft=n_fft,
    hop_length=hop_len,
    win_length=win_len,
    center=True,
    pad_mode="reflect",
    power=2.0,
)

spec

MSE: 5.792542556726232e-10

MelScale conversion Confirmed that torchaudio.functional.create_fb_matrix and librosa.filters.mel can produce numerically comparable results. [script]

torchaudio_mel = torchaudio.functional.create_fb_matrix(
    int(n_fft // 2 + 1),
    n_mels=n_mels,
    f_min=0.,
    f_max=sample_rate/2.,
    sample_rate=sample_rate,
    norm='slaney'
)

librosa_mel = librosa.filters.mel(
    sample_rate,
    n_fft,
    n_mels=n_mels,
    fmin=0.,
    fmax=sample_rate/2.,
    norm='slaney',
    htk=True,
).T

mel_bins

MSE: 3.6859009276685303e-16

MelSpectrogram Confirmed that torchaudio.transforms.MelSpectrogram and librosa.feature.melspectrogram can produce numerically comparable results. [script]

torchaudio_melspec = torchaudio.transforms.MelSpectrogram(
    sample_rate=sample_rate,
    n_fft=n_fft,
    win_length=win_len,
    hop_length=hop_len,
    center=True,
    pad_mode="reflect",
    power=2.0,
    norm='slaney',
    onesided=True,
    n_mels=n_mels,
)(waveform)
librosa_melspec = librosa.feature.melspectrogram(
    waveform.numpy(),
    sr=sample_rate,
    n_fft=n_fft,
    hop_length=hop_len,
    win_length=win_len,
    center=True,
    pad_mode="reflect",
    power=2.0,
    n_mels=n_mels,
    norm='slaney',
    htk=True,
)

mel_spec

MSE: 3.748331423025775e-09

Call-stacks

MelSpectrogram MelSpectrogram call stack of torchaudio and librosa;
- torchaudio.transforms.MelSpectrogram
  - torchaudio.transforms.Spectrogram
    - torchaudio.functional.spectrogram
  - torchaudio.transforms.MelScale
    - torchaudio.functional.create_fb_matrix
- librosa.feature.melspectrogram
  - librosa.core.spectrum._spectrogram
  - librosa.filters.mel

2reactions

mthrokcommented, Feb 3, 2021

@eldrin @SolomidHero

I have merged #1212 so we can pass slaney normalization as a parameter to MelSpectrogram transform. I will keep looking at a way to add other filter bank option and numerical parity to librosa.