Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: Compile pocketfft with newer vector instructions?

See original GitHub issue

Is your feature request related to a problem? Please describe.

pocketfft supports vectorization of batched transforms, but currently our releases only compile with some variant of SSE instruction sets. A simple %timeit benchmark of scipy.fft shows noticeable speedups, particularly in the jump from SSE to AVX registers (2 x float64 to 4 x float64).

Function	Shape	SSE4.2 (us)	AVX (us)	AVX2 (us)
fft	(1000, 120)	361	286	289
	(400, 200)	228	202	203
rfft	(1000, 120)	180	157	155
	(400, 200)	120	99	101

Describe the solution you’d like.

AVX is suffieciently old that we may just be able to ship binaries with AVX by default. Quoting the wikipedia page

[AVX was] first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later by AMD with the Bulldozer processor shipping in Q3 2011.

So any x86 computer without AVX support should be at least a decade old by this point.

cc @rgommers

Describe alternatives you’ve considered.

An alternative solution may be to compile pocketfft twice, with and without AVX, then import only the version supported by the CPU using runtime cpu feature detection.

Additional context (e.g. screenshots, GIFs)

No response

Issue Analytics

State:
Created a year ago
Comments:15 (15 by maintainers)

Top GitHub Comments

1reaction

rgommerscommented, Oct 4, 2022

It’s the --quick flag to asv run which really distorts the results. Sigh. With it I get the results above, without it I get:

[100.00%] ··· fft_basic.Fft.time_fft                                                                                                          ok
[100.00%] ··· ====== ======= =============== ============= =============
              --                                module
              -------------- -------------------------------------------
               size    type   scipy.fftpack    scipy.fft     numpy.fft
              ====== ======= =============== ============= =============
               100     real      2.82±0μs     3.10±0.01μs   1.71±0.03μs
               100    cmplx    2.57±0.01μs    2.84±0.03μs   1.71±0.01μs
               256     real    3.37±0.01μs      3.69±0μs    2.31±0.01μs
               256    cmplx    3.04±0.02μs    3.34±0.02μs   2.31±0.01μs
               313     real    10.3±0.01μs    10.6±0.02μs   12.2±0.01μs
               313    cmplx    9.26±0.02μs    9.55±0.02μs   12.2±0.01μs
               512     real    4.51±0.01μs    4.87±0.01μs   3.75±0.01μs
               512    cmplx    4.05±0.02μs    4.35±0.05μs   3.66±0.07μs
               1000    real    7.13±0.02μs    7.48±0.01μs   6.76±0.01μs
               1000   cmplx    7.07±0.01μs    7.36±0.01μs   6.73±0.01μs
               1024    real    6.70±0.03μs    7.01±0.02μs   6.36±0.01μs
               1024   cmplx    6.28±0.01μs    6.64±0.03μs   6.31±0.01μs
               2048    real    11.6±0.05μs    12.0±0.01μs   12.3±0.02μs
               2048   cmplx    11.1±0.02μs    11.4±0.02μs   12.4±0.01μs
               4096    real    20.9±0.07μs    21.4±0.01μs   24.9±0.02μs
               4096   cmplx     30.5±0.3μs     30.5±0.4μs   25.0±0.01μs
               8192    real     43.1±0.1μs    43.8±0.03μs   56.5±0.03μs
               8192   cmplx     58.4±0.3μs     58.6±0.8μs    56.3±0.2μs
              ====== ======= =============== ============= =============

Didn’t realize it was there, or that it had such a bad effect on the produced statistics.

1reaction

eli-schwartzcommented, Sep 29, 2022

FWIW, I think it’s mostly unstable because we haven’t gotten a lot of feedback from people using it, to understand if it does what people need.

Top Results From Across the Web

ENH: vectorize scipy.fft for more CPU architectures #10463

Currently, pypocketfft used by scipy.fft only supports vector instructions on x86 CPUs. Enabling vector support for other architectures is ...

How to Write Fast Code SIMD Vectorization

How to use it: compiler vectorization, class library, intrinsics, ... New 2-way and 4-way vector instructions for complex arithmetic.

Vector Extensions (Using the GNU Compiler Collection (GCC))

The index arguments are a list of integers that specify the elements indices of the first two vectors that should be extracted and...

Your CPU supports instructions that this TensorFlow binary ...

Advanced Vector Extensions (AVX) are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel ...

Intel discloses “vector+SIMD” instructions for future processors

Looking at the first instruction disclosed, the V4FMADDPS instruction performs 4 consecutive multiply-accumulate operations with a single 512- ...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

ENH: Compile pocketfft with newer vector instructions?

Is your feature request related to a problem? Please describe.

Describe the solution you’d like.

Describe alternatives you’ve considered.

Additional context (e.g. screenshots, GIFs)

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

BUG: (Python 3.11) Compiler cl can not compile programs

BUG: 't_span' is illustrated as 2-tuple of floats in scipy‘s documention, but which is always defined as array_like in examples