[Discussion] CQT, VQT, energy preservation, and frequency band alignment
See original GitHub issueI wanted to make a separate issue to consolidate the discussion around a number of related points that have popped up in the implementation of VQT #1018, inverse CQT #165, and so on. @lostanlen and I have discussed these things in various places (often offline), but it would be helpful to have a more permanent record.
Frequency band definitions
In the CQT implementation, we follow the SK2010 definition of the frequency bands, which are essentially left-aligned: [f, a * f]
(for some constant a
). This works well enough, but it makes some calculations awkward, eg in the VQT. Later on in the CQT toolbox, the definition shifted to a centered representation, eg [f / sqrt(a), f * sqrt(a)]
(again for some constant a
), so that the frequency of each filter is centered within the band.
Some questions:
- Should we add support for centered frequency bands? I don’t think it would change much, except possibly some boundary effects at the top end of octaves.
- Should we make centering the default? Should we even bother with left-aligned bands anymore?
Energy preservation
Our CQT implementation has historically had a bunch of headaches around normalization and energy preservation. Of course we can’t exactly preserve energy with a lossy frequency representation, but our current cqt/icqt round trip gets pretty close. However, other feature representations (notably Mel) are not so forgiving, and it would be nice if we could strive for consistency here.
Issue Analytics
- State:
- Created 4 years ago
- Comments:37 (32 by maintainers)
Top GitHub Comments
Yeah, that’s loosely speaking what I had in mind. A couple of caveats:
wavelet_length
:(I guess we’re cool with a per-frequency gamma?)
f_cutoff
invqt
to enable early downsampling. Actually, it might make sense to migrate the Nyquist check out of the filter constructor and into thevqt
function itself. This would effectively mean thatwavelet_lengths
should return both the filter lengths and the cutoff frequency. A little awkward, but probably a better compromise overall.Yes, sorry, i hadn’t tested it.
freq[-2]
of course