Accelerate parabolic interpolation in yin/pyin
See original GitHub issueIs your feature request related to a problem? Please describe.
In the general theme of the 0.10 milestone, I noticed a step in yin/pyin that could be done much more efficiently. The parabolic interpolation function:
https://github.com/librosa/librosa/blob/0a60e0a87888d1bc8b2c6098e17d2b98536a28a0/librosa/core/pitch.py#L427-L450
works by slicing the input array and doing some vectorized arithmetic ops to compute the intermediate values. Each intermediate value (parabola_a
and parabola_b
) will therefore require allocating almost as much space as the input array (y_frames
) - so we’re tripling memory usage unnecessarily.
Describe the solution you’d like
The operation here is entirely local, and could be easily implemented by a numba stencil operation. This is very similar to what we did for localmax/localmin in #1533, eg: https://github.com/librosa/librosa/blob/0a60e0a87888d1bc8b2c6098e17d2b98536a28a0/librosa/util/utils.py#L983-L1003
Describe alternatives you’ve considered This isn’t the biggest speed bottleneck in yin/pyin, but it would be easy to improve so we should do it anyway.
Issue Analytics
- State:
- Created 10 months ago
- Reactions:1
- Comments:6 (6 by maintainers)
Top GitHub Comments
Trying to wrap this one up today - I am seeing some numerical differences between 0.9.2, but only in regions where the input is locally flat / near silent. I think this all comes about from the change in how the edge case is handled:
https://github.com/librosa/librosa/blob/899811ff508ce655e4ca346c1e74ade7c1800839/librosa/core/pitch.py#L449
where we no longer have the stabilization by
util.tiny(a)
. I’m fine with this, since the stabilization was there originally so we could divide first and then check. Here we check before division, so it should be okay. PR forthcoming!I took a quick cut at vectorizing this inner loop and didn’t see any improvement. I suspect this could be improved though, because many of the operations in this loop would support direct vectorization, so calling them separately at each step is adding more overhead than we need.
Let’s punt on further acceleration here, as it would require a pretty substantial refactor.