question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: Compile pocketfft with newer vector instructions?

See original GitHub issue

Is your feature request related to a problem? Please describe.

pocketfft supports vectorization of batched transforms, but currently our releases only compile with some variant of SSE instruction sets. A simple %timeit benchmark of scipy.fft shows noticeable speedups, particularly in the jump from SSE to AVX registers (2 x float64 to 4 x float64).

Function Shape SSE4.2 (us) AVX (us) AVX2 (us)
fft (1000, 120) 361 286 289
(400, 200) 228 202 203
rfft (1000, 120) 180 157 155
(400, 200) 120 99 101

Describe the solution you’d like.

AVX is suffieciently old that we may just be able to ship binaries with AVX by default. Quoting the wikipedia page

[AVX was] first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later by AMD with the Bulldozer processor shipping in Q3 2011.

So any x86 computer without AVX support should be at least a decade old by this point.

cc @rgommers

Describe alternatives you’ve considered.

An alternative solution may be to compile pocketfft twice, with and without AVX, then import only the version supported by the CPU using runtime cpu feature detection.

Additional context (e.g. screenshots, GIFs)

No response

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:15 (15 by maintainers)

github_iconTop GitHub Comments

1reaction
rgommerscommented, Oct 4, 2022

It’s the --quick flag to asv run which really distorts the results. Sigh. With it I get the results above, without it I get:

[100.00%] ··· fft_basic.Fft.time_fft                                                                                                          ok
[100.00%] ··· ====== ======= =============== ============= =============
              --                                module
              -------------- -------------------------------------------
               size    type   scipy.fftpack    scipy.fft     numpy.fft
              ====== ======= =============== ============= =============
               100     real      2.82±0μs     3.10±0.01μs   1.71±0.03μs
               100    cmplx    2.57±0.01μs    2.84±0.03μs   1.71±0.01μs
               256     real    3.37±0.01μs      3.69±0μs    2.31±0.01μs
               256    cmplx    3.04±0.02μs    3.34±0.02μs   2.31±0.01μs
               313     real    10.3±0.01μs    10.6±0.02μs   12.2±0.01μs
               313    cmplx    9.26±0.02μs    9.55±0.02μs   12.2±0.01μs
               512     real    4.51±0.01μs    4.87±0.01μs   3.75±0.01μs
               512    cmplx    4.05±0.02μs    4.35±0.05μs   3.66±0.07μs
               1000    real    7.13±0.02μs    7.48±0.01μs   6.76±0.01μs
               1000   cmplx    7.07±0.01μs    7.36±0.01μs   6.73±0.01μs
               1024    real    6.70±0.03μs    7.01±0.02μs   6.36±0.01μs
               1024   cmplx    6.28±0.01μs    6.64±0.03μs   6.31±0.01μs
               2048    real    11.6±0.05μs    12.0±0.01μs   12.3±0.02μs
               2048   cmplx    11.1±0.02μs    11.4±0.02μs   12.4±0.01μs
               4096    real    20.9±0.07μs    21.4±0.01μs   24.9±0.02μs
               4096   cmplx     30.5±0.3μs     30.5±0.4μs   25.0±0.01μs
               8192    real     43.1±0.1μs    43.8±0.03μs   56.5±0.03μs
               8192   cmplx     58.4±0.3μs     58.6±0.8μs    56.3±0.2μs
              ====== ======= =============== ============= =============

Didn’t realize it was there, or that it had such a bad effect on the produced statistics.

1reaction
eli-schwartzcommented, Sep 29, 2022

FWIW, I think it’s mostly unstable because we haven’t gotten a lot of feedback from people using it, to understand if it does what people need.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ENH: vectorize scipy.fft for more CPU architectures #10463
Currently, pypocketfft used by scipy.fft only supports vector instructions on x86 CPUs. Enabling vector support for other architectures is ...
Read more >
How to Write Fast Code SIMD Vectorization
How to use it: compiler vectorization, class library, intrinsics, ... New 2-way and 4-way vector instructions for complex arithmetic.
Read more >
Vector Extensions (Using the GNU Compiler Collection (GCC))
The index arguments are a list of integers that specify the elements indices of the first two vectors that should be extracted and...
Read more >
Your CPU supports instructions that this TensorFlow binary ...
Advanced Vector Extensions (AVX) are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel ...
Read more >
Intel discloses “vector+SIMD” instructions for future processors
Looking at the first instruction disclosed, the V4FMADDPS instruction performs 4 consecutive multiply-accumulate operations with a single 512- ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found