Taking multi-channel seriously
See original GitHub issueMost of librosa only supports monophonic audio. But for many of the analyses we’d like to do, stereo or multi-channel support would be very useful and not all that difficult at this point.
This issue is meant to kick off discussion of how this will work, but I have some thoughts as outlined below.
Conventions
In general, we should continue to support native mono y.shape = (N,)
without artificial up-casting to explicit mono (1, N)
.
For example, an stft on (N,)
will still produce an output of shape (# freqs, # frames)
.
However, an stft on (1, N)
would produce output (1, #freqs, #frames)
. Now that framing is fully generalized to multichannel, this should not present any difficulties. More generally, (K, N)
would map to (K, #freqs, #frames)
.
As a general rule, the trailing dimension will be (usually) treated as time-like (samples, frames, etc), and the leading dimension will be channels.
How it will work
Things that will generalize easily
- STFT and friends
- Linear filters (mel, mfcc, chroma_stft, and co)
- HPSS
Things that will be a pain to generalize
- CQT and friends?
- Structure / recurrence
- General decomposition
- effects?
- inverse transforms (mfcc, mel, etc)
Things that should stay monophonic
- Detectors (onset, beat, pitch)
- Display
- Sequence modeling (dtw, viterbi)
Issue Analytics
- State:
- Created 3 years ago
- Reactions:5
- Comments:12 (12 by maintainers)
Top GitHub Comments
Update: this should now be fixed thanks to https://github.com/numpy/numpy/pull/16446 being merged.
I don’t think that’s the issue we’re having though. The inputs here are already arrays (not lists).