Incorrect size of mel spectrogram
See original GitHub issueHi,
I compute the mel spectrogram on a time-domain signal that has 13230080
samples, like so:
mel_spectrogram = librosa.feature.melspectrogram(audio_data, sr=44100, n_fft=2048, hop_length=512)
The resulting shape of the mel_spectrogram
is (128, 25841)
, however as far as I understand it should be (128, 25840)
(n_mel, length of time domain signal / hope size). For some reason it has one extra frame.
Can you please explain? Thanks!
Issue Analytics
- State:
- Created 7 years ago
- Comments:11 (5 by maintainers)
Top Results From Across the Web
librosa melspectrogram y-axis scale wrong? - Stack Overflow
I'm trying to figure out why Mel scale spectrogram seems to have the wrong frequency scale. I generate a 4096Hz tone and plot...
Read more >Getting to Know the Mel Spectrogram | by Dalya Gartzman
Generate a Mel scale: Take the entire frequency spectrum, and separate it into n_mels=128 evenly spaced frequencies. And what do we mean by ......
Read more >librosa.feature.melspectrogram
Compute a mel-scaled spectrogram. If a spectrogram input S is provided, then it is mapped directly onto the mel basis by mel_f.dot(S) ...
Read more >nnAudio.features.mel.MelSpectrogram — 0.3.1 - Kin Wai Cheuk
This function is to calculate the Melspectrogram of the input signal. ... htk (bool) – When False is used, the Mel scale is...
Read more >what should be the constraint on window length in function ...
I am using the in built function melspectrogram in a loop, ... Invalid window length. ... Can someone guide me whats wrong here?...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
It’s
(n - n_fft)
in the numerator.Working backwards, say you have a maximum frame number
T
and that frames are left-aligned. For the last frame to be contained in the signal, you needT * hop_length + n_fft < n
. Rearranging terms, you getT < (n - n_fft) / hop_length
.The 1+ is there to handle the zero-hop case (ie the first frame).
Right, I computed it incorrectly, got it now. Thanks again! 💯