librosa.feature.rms inconsistency - y vs S
See original GitHub issueDescription
When calling librosa.feature.rms
, the result using a wave y
and the result using the spectrogram S
are different.
This is because the algorithm is not correctly implemented to satisfy the Parseval’s equation.
Specifically, the direct current bin S[0]
(and only if n_fft
is odd, the sr/2 bin S[-1]
) has twice as much influence as the other bins.
Steps/Code to Reproduce
import numpy as np
import librosa
# set paramters
np.random.seed(0)
frame_length = 2048
hop_length = 512
center = True
# make a wave and its spectrogram
y = np.random.rand(3000) # wave, DC component is about 0.5 because the range of np.random.rand is [0, 1)
S = librosa.magphase(librosa.stft(y, # spectrogram
n_fft=frame_length,
hop_length=hop_length,
window=np.ones,
center=center))[0]
# calculate RMS
rms1 = librosa.feature.rms(y=y, frame_length=frame_length, hop_length=hop_length, center=center)
rms2 = librosa.feature.rms(S=S, frame_length=frame_length, hop_length=hop_length, center=center)
print(rms1)
print(rms2)
print(np.allclose(rms1, rms2))
Expected Results
rms1
and rms2
are almost the same.
Actual Results
[[0.57401353 0.58105711 0.58252662 0.58571533 0.58668334 0.58252178]]
[[0.75760764 0.76713306 0.76892614 0.7742523 0.77579546 0.77086043]]
False
Versions
Darwin-18.7.0-x86_64-i386-64bit Python 3.7.3 (default, May 28 2019, 02:11:32) [Clang 10.0.1 (clang-1001.0.46.3)] NumPy 1.17.4 SciPy 1.3.3 librosa 0.7.1
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
librosa.feature.rms — librosa 0.10.0.dev0 documentation
Compute root-mean-square (RMS) value for each frame, either from the audio samples y or from a spectrogram S . Computing the RMS value...
Read more >Unsupervised anomaly detection - Hugo Le Moine
Compute a spectral flux onset strength envelope [5]. RMS. Computed by librosa.feature.rms. Compute the root-mean-square (RMS) value for each frame. Features ...
Read more >Librosa Library in Python - Javatpoint
The variable srs contains the examining pace of y or at least the number of tests each second of sound. Naturally, all sound...
Read more >arXiv:2005.08848v1 [cs.SD] 18 May 2020
extract them is inconsistent across the field, resulting in a need for harmonization. Surfboard is a Python package for audio feature ...
Read more >Modelling Timbral Hardness | HTML - MDPI
A multilinear regression model was developed on six features: maximum bandwidth, ... of the user-supplied metadata, which may be sparse or inconsistent.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@bmcfee one explanation could be the DC component in
np.rand
, which is nonnegativeThere are two causes for calculation errors in the previous implementation (correct vs. previous):
frame_length
is even, 1/(2M)^2 vs. 1/(4M(M+1))frame_length
is odd, 1/(2M+1)^2 vs. 1/(4M(M+1))frame_length
is even, none vs. |S[0]|^2 + |S[M]|^2frame_length
is odd, none vs. |S[0]|^2The larger
frame_length
(i.e.,n_fft
) is, the smaller the influence of these error factors is. However, if the value offrame_length
is small or the DC component (i.e., |S[0]|^2) is relatively large, the calculation error cannot be ignored.I guess that the reason why the previous implementation passed the previous test is as follows.
frame_length
was large enough to ignore the calculation error.