Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incorporating new Audio feature - CQCC

See original GitHub issue

Is your feature request related to a problem? Please describe.

It’s not related to a problem. We are using librosa as our go-to tool for audio feature extraction. We love that we get an API to extract STFT and MFCC. We would like to have CQCC incorporated into the core feature subset of librosa.

Describe the solution you’d like Maybe a new API like librosa.features.cqcc

Describe alternatives you’ve considered Currently, we are trying to write the conditions ourselves, but as is the case always, we may have a fix, but the library would have better standards.

Additional context

CQCC was introduced here: https://www.asvspoof.org/papers/CSL_CQCC.pdf. Since then, multiple papers have incorporated the use of CQCC for their own research purposes.

Additonally, the authors of CQCC have an open source implementation in MATLAB here: http://audio.eurecom.fr/content/software

A general block diagram listing the steps used to generate CQCC:

Issue Analytics

State:
Created 2 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

ShubhankarKGcommented, Jan 27, 2022

Thanks for all the inputs everyone! I’ll close this issue out now since it’s not related to librosa per se. Thanks for your time everyone!

1reaction

bmcfeecommented, Jan 18, 2022

Your issue reminds me of this ISMIR 2016 paper by Mi Tian and Mark Sandler:

Right - near as I can tell, the main difference between the two methods (aside from the filter shape) is that the feature requested here involves interpolating the frequency range back to a linear scale after the CQT/dB scaling but before the DCT. I find this step somewhat puzzling, and I’m not quite sure why it would be an improvement over a vanilla cepstrum. It mainly seems like a roundabout way to do a cepstrum with some non-uniform time and frequency smoothing, but perhaps I’m missing something important here.