convolve_fft is >10x slower than scipy.signal.fftconvolve
See original GitHub issueIt has been reported in the statmorph
package that astropy.convolution.convolve_fft
is at least an order of magnitude slower than scipy.signal.fftconvolve
. Plot of runtimes vs. kernel size:
CC: @vrodgom
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:12 (11 by maintainers)
Top Results From Across the Web
Convolution Product In Pyfftw Different From Scipy - ADocLib
I'm trying to implement a FFT convolution that mimics scipy.fftconvolve using pyfftw ... signal.fftconvolve 10 slower than custom real convolution #8154.
Read more >SciPy.signal.py - gists · GitHub
the direct method implemented by convolveND will be slow for large data. -- though it currently could use some optimizations)." scipy.signal.fftconvolve is ...
Read more >Convolution Filters - vigra - GitHub Pages
Convolve an array with a kernel by means of the Fourier transform. ... The filter responses may be used to calculate the monogenic...
Read more >radis package — RADIS 0.13.1 documentation
When individual evaluations are very fast, dispatching calls to workers can be slower than sequential computation because of the overhead.
Read more >an application to simulations of massive prestellar cores - arXiv
ACA observations improve in signal-to-noise ratio and lead to better column density ... the convolve fft3 function from the Astropy Python.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I have been looking in a bit more detail at what astropy is doing for
fft_pad
compared to scipy, and it seems the padding scheme is partly outdated and partly simply mistaken. From the Notes on https://docs.scipy.org/doc/scipy/reference/generated/scipy.fft.fft.htmlBottom line I read from that is that the
scipy.fft
implementation is still reasonably fast for any random size, and more importantly,scipy.fft.next_fast_len
will in general find a size that is much closer to the original than the next power of 2. In contrast, ourconvolve_fft
pads to the next power of 2, potentially almost doubling array size in one dimension (especially if an original 2^N-size array is first padded by a small amount by
psf_pad
)then expands every dimension to the size from (1) or the largest dimension.
I simply could not find the rationale for the second step; as I understand the scipy docs, n-dimensional FFTs are basically done recursively over the dimensions, and
_freq_domain_conv
is separately padding each dimension to its optimal size:Timing tests indicate that that is the most efficient way for
np.fft.*fft
, too (as well asscipy.fftpack
).For
convolve_fft
this becomes really obvious with non-square inputs. Modifying the above tests forny, nx = 1229, 4099
(using prime number-sized arrays, which supposedly have the poorest performance), I have made the corresponding timings forNext I tested a modified version setting
newshape
to next powers of 2, but individually for each axis. This creates a more manageable 2048*8192 array, but still doubles the time compared to no padding at all. Finally, I replaced the powers of 2 by values fromscipy.fft.next_fast_len()
(there are two options for complex or real input which seem to produce very similar results – possiblyreal=True
only brings real advantages forfft.rfft
, which we currently cannot use since the input is always unconditionally cast tocomplex
). That brings the time within a factor of 2 or so offftconvolve
, and still closer tofftconvolve
for complex input; using thescipy.fft
functions brings some 10-20 % further speedup, so the remaining difference can probably be attributed to the extra pre- and post-processing. Note that disablingpsf_pad
had little impact in these latter tests (psf_pad=False
rather becomes slower withfft_pad=False
for this case, possibly because the padding makes the sizes at least non-prime), but introduces noticeable differences in the result close to the borders.OK, as I said, this is because of the selected options.
psf_pad
is required to avoid edge-wrapping with fft convolution. With that off, the performance improves ~10x because the convolution is of a 1024^2 image rather than a 1536^2 image, which is what was being used in the previous tests.As far as I can tell, the difference for small kernels - which is a large factor but a small time - is from overheads, possibly from promoting the kernel to complex data type
Also, scipy defaults to using
rfft
, which is faster, if both the image and kernel are real. We could add that check to astropy convolution, that would get us a little boost.scipy handles array padding in a different way that might actually be better, particularly for small kernels. I haven’t figured out what it does yet because it’s buried down close to the C layer. They may be taking the fft of the kernel, then padding, instead of padding, then taking the fft, which is clever. I’m not willing to commit to that being correct until I’ve had more coffee.
scipy’s fftconvolve is doing one other thing I don’t understand that is enabling them to avoid edge-wrapping without using the same padding approach we are. I suspect it’s what I just mentioned, and it may be responsible for scipy being just a smidgen slower for the biggest kernel size.