MelSpectrogram inconsistency with librosa melspectrogram
See original GitHub issueHello! I am excited with this framework a lot and its ability to make transformations on gpu.
Problem:
transforms.Spectrogram
(with power 1.) (which is real) output equals to absolute value of librosa.stft
(which is complex) with equal parameters.
Here is spectrograms for my example audio (really close results):
Next step is to get melspectrogram using transforms.MelScale
(on Spectrogram
with power 1) and librosa.feature.melspectrogram
(actually power is 1., this argument not in use) (using previous spectrogram). And here we can’t get the same result:
- in both steps only matmul takes place
- in
transforms.MelScale
tensors with real values multiplicated, inlibrosa.feature.melspectrogram
gives us multiplication of complex based matrices, thus in the result we can get absolutely different values - also quite misleading use of
power
intransforms.Spectrogram
(don’t need inlibrosa.stft
)
And the result (differs not only in some fields, but in scale too):
Issue Analytics
- State:
- Created 3 years ago
- Comments:17 (9 by maintainers)
Top Results From Across the Web
librosa mel spectrogram Hz scaling issue - Stack Overflow
I am having some odd vertical scaling issues with librosa.feature.melspectrogram(). It seems that when I use librosa.load() with sr=None, ...
Read more >librosa.feature.melspectrogram
Compute a mel-scaled spectrogram. If a spectrogram input S is provided, then it is mapped directly onto the mel basis by mel_f.dot(S) ...
Read more >Questions related to librosa.feature.melspectrogram function
Hello, I'm now using librosa.feature. melspectrogram to generate the spectrogram of wav file. But there are some problems:.
Read more >Audio Deep Learning Made Simple - Why Mel Spectrograms ...
A Mel Spectrogram makes two important changes relative to a regular Spectrogram that plots Frequency vs Time. It uses the Mel Scale instead...
Read more >Convolutional Neural Networks Using Log Mel-Spectrogram ...
a CNN based on the log mel-spectrogram separation technique was applied ... There are various problems to be solved in the classification of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Okay, I did further research and could reproduce librosa’s melspectrogram with torchaudio. The parameters added in #1212 helped.
Numerical compatibility
MSE: 5.792542556726232e-10
MSE: 3.6859009276685303e-16
MSE: 3.748331423025775e-09
Call-stacks
@eldrin @SolomidHero
I have merged #1212 so we can pass
slaney
normalization as a parameter toMelSpectrogram
transform. I will keep looking at a way to add other filter bank option and numerical parity tolibrosa
.