uarray based backend compatibility tracker
See original GitHub issueProblem
SciPy adopted uarray
to support a multi-dispatch mechanism with the goal being: no need for OpenMP or GPU kernels etc. in the codebase. See motivation below for more concrete discussion.
SciPy currently supports this through the scipy.fft
module.
-
scipy.fft
#10383
There are other scipy modules that will benefit from uarray
backend, and later extending the usage through libraries like CuPy
(cupyx.scipy) and Dask
(dask.array) etc.
Proposed Modules
-
scipy.ndimage
#14356 Note:cupyx.scipy.ndimage
has almost all (except a couple: geometric_transform, watershed_ift) functions implemented whiledask-image
is currently less complete.dask-image
has a different namespace structure currently, but dask/dask-image#198 plans to address this.- Filters
- Fourier filters
- Interpolation
- Measurements
- Morphology
-
scipy.linalg
#14407 TODO: Add more comprehensive note on cross library availability of functions later. For now, a quick look tells me that not all functions are available incupy
ordask
.- Basics
- Eigenvalue Problems
- Decompositions
- Matrix Functions
- Matrix Equation Solvers
- Sketches and Random Projections
- Special Matrices
- Low-level routines
-
scipy.special
Note: These are element-wise functions; those can be made to work with dask fairly easily later on. CuPy already has some of the functions.
Obviously once SciPy support is added, these libraries should be updated to make use of uarray, similar to what was done here.
Motivation for uarray
See gh-10204 comment
The protocol not covering things like array creation functions is one thing, but there’s a more important limitation I think: it is specific to “types of arrays”. So if you want to create functions with the same API for GPU arrays (CuPy, PyTorch), distributed arrays (Dask), sparse arrays (scipy.sparse, pydata/sparse), then it works. But what if you want to provide an alternative implementation for ndarrays? You simply cannot do that. Pyfftw, mkl-fft and pypocketfft all work on regular numpy arrays. So letting the numpy array carry around information about what implementation to use is just fundamentally not going to work. Instead, it’s the library that must be able to say “hey, here’s an implementation (perhaps for specific types)”, and a mechanism for either automatic or user-controlled selection of which implementation/backend to use.
See gh-13965 comment
For example, a CUDA-based tensor object from a deep learning framework could invoke CuFFT. I think (not 100% certain) that this also allows you to slot in your own preferred FFT library as a backend even for plain-old numpy ndarray objects. We used to have multiple FFT backends selected at build time, but it was difficult to add new ones, and not easy to support incompatibly-licensed FFT libraries like the popular FFTW. I think this new multidispatch mechanism allows that to be slotted in at runtime.
See gh-13965 comment
It’s possible it will extend to scipy.linalg, as it also has some need to swap out backends like that, but it probably won’t be a widely used pattern across all of scipy.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:6 (6 by maintainers)
Sure, that would be great @czgdp1807! Please confirm the same with @peterbell10 or @rgommers before you start.
The best way to find real-world issues, as well as judge things like code complexity, is to write code. Modules like
ndimage
andlinalg
are significantly more complex thanfft
, so it turns up cases of interest that we need to take into account. Backends are not that difficult to write once you wrap your head around them, so having working backends for the most important SciPy modules is very useful. Which is why we defined internships that included this, first for @AnirudhDagar and now for @Smit-create.