ENH: Compile pocketfft with newer vector instructions?
See original GitHub issueIs your feature request related to a problem? Please describe.
pocketfft
supports vectorization of batched transforms, but currently our releases only compile with some variant of SSE instruction sets. A simple %timeit
benchmark of scipy.fft
shows noticeable speedups, particularly in the jump from SSE to AVX registers (2 x float64 to 4 x float64).
Function | Shape | SSE4.2 (us) | AVX (us) | AVX2 (us) |
---|---|---|---|---|
fft | (1000, 120) | 361 | 286 | 289 |
(400, 200) | 228 | 202 | 203 | |
rfft | (1000, 120) | 180 | 157 | 155 |
(400, 200) | 120 | 99 | 101 |
Describe the solution you’d like.
AVX is suffieciently old that we may just be able to ship binaries with AVX by default. Quoting the wikipedia page
[AVX was] first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later by AMD with the Bulldozer processor shipping in Q3 2011.
So any x86 computer without AVX support should be at least a decade old by this point.
cc @rgommers
Describe alternatives you’ve considered.
An alternative solution may be to compile pocketfft twice, with and without AVX, then import only the version supported by the CPU using runtime cpu feature detection.
Additional context (e.g. screenshots, GIFs)
No response
Issue Analytics
- State:
- Created a year ago
- Comments:15 (15 by maintainers)
It’s the
--quick
flag toasv run
which really distorts the results. Sigh. With it I get the results above, without it I get:Didn’t realize it was there, or that it had such a bad effect on the produced statistics.
FWIW, I think it’s mostly unstable because we haven’t gotten a lot of feedback from people using it, to understand if it does what people need.