icqt outputs noise with unusual format
See original GitHub issueDescription
icqt (inverse CQT) return loud noise with unusual format when I simply convert “.wav => CQT =>.wav”.
I cannot open output file with Windows10’s Groove music app (iSTFT result can be opened), so I try to open with Audacity.
Audacity enable opening the icqt output .wav file, but the result is loud noise.
Steps/Code to Reproduce
Even example of official document return unintended results.
import librosa
y, sr = librosa.load(librosa.util.example_audio_file(), duration=15)
C = librosa.cqt(y, sr=sr)
y_hat = librosa.icqt(C, sr=sr)
librosa.output.write_wav("./input.wav", y, sr, norm=True)
librosa.output.write_wav("./reconstructed.wav", y_hat, sr, norm=True)
basically same as official sample code
Expected Results
Above code is simply “wave => CQT => wave” conversion, so I expected to be almost all same wav file.
Sound could be degraded only a little, but should not be a noise.
Actual Results
Loud noise. No music info.
At the same time, SciPy warning displayed.
FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`.
In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
Versions
Windows-10-10.0.17134-SP0 Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)] NumPy 1.15.4 SciPy 1.1.0 librosa 0.6.2
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
One more thing you might try: in icqt, set amin=1e-2 (rather than the default of 1e-6). This controls the silence threshold in the inverse window normalization, which I suspect is set to be far too low by default. Bringing it up to 1e-2 gets the default analysis parameters in the ballpark of something listenable (though it’s still heavily distorted, and the above suggestions will improve things).
It’s possible that we could do something much smarter in setting the window inversion threshold, and that could improve things across the board. I’ll stick this on the docket for 0.7.
I looked into this today, and what’s happening is that the default CQT analysis parameters are just not invertible. The problem ultimately boils down to the fact that the filters are too short relative to the default hop length, which produces (time) gaps in the analysis at higher frequency bands. These gaps become numerically unstable when it comes time to invert the CQT, hence the noise / buzz. I don’t think this is avoidable with the current default parameters, but we can certainly make the icqt method a bit smarter at alerting the user when the parameters are not going to produce a faithful inversion.
That said, here are a few potential workarounds for you: