uarray based backend compatibility tracker

Problem

SciPy adopted uarray to support a multi-dispatch mechanism with the goal being: no need for OpenMP or GPU kernels etc. in the codebase. See motivation below for more concrete discussion.

SciPy currently supports this through the scipy.fft module.

scipy.fft #10383

There are other scipy modules that will benefit from uarray backend, and later extending the usage through libraries like CuPy (cupyx.scipy) and Dask (dask.array) etc.

Proposed Modules

scipy.ndimage #14356 Note: cupyx.scipy.ndimage has almost all (except a couple: geometric_transform, watershed_ift) functions implemented while dask-image is currently less complete. dask-image has a different namespace structure currently, but dask/dask-image#198 plans to address this.
- Filters
- Fourier filters
- Interpolation
- Measurements
- Morphology
scipy.linalg #14407 TODO: Add more comprehensive note on cross library availability of functions later. For now, a quick look tells me that not all functions are available in cupy or dask.
- Basics
- Eigenvalue Problems
- Decompositions
- Matrix Functions
- Matrix Equation Solvers
- Sketches and Random Projections
- Special Matrices
- Low-level routines
scipy.special Note: These are element-wise functions; those can be made to work with dask fairly easily later on. CuPy already has some of the functions.

Obviously once SciPy support is added, these libraries should be updated to make use of uarray, similar to what was done here.

Motivation for uarray

See gh-10204 comment

The protocol not covering things like array creation functions is one thing, but there’s a more important limitation I think: it is specific to “types of arrays”. So if you want to create functions with the same API for GPU arrays (CuPy, PyTorch), distributed arrays (Dask), sparse arrays (scipy.sparse, pydata/sparse), then it works. But what if you want to provide an alternative implementation for ndarrays? You simply cannot do that. Pyfftw, mkl-fft and pypocketfft all work on regular numpy arrays. So letting the numpy array carry around information about what implementation to use is just fundamentally not going to work. Instead, it’s the library that must be able to say “hey, here’s an implementation (perhaps for specific types)”, and a mechanism for either automatic or user-controlled selection of which implementation/backend to use.

See gh-13965 comment

For example, a CUDA-based tensor object from a deep learning framework could invoke CuFFT. I think (not 100% certain) that this also allows you to slot in your own preferred FFT library as a backend even for plain-old numpy ndarray objects. We used to have multiple FFT backends selected at build time, but it was difficult to add new ones, and not easy to support incompatibly-licensed FFT libraries like the popular FFTW. I think this new multidispatch mechanism allows that to be slotted in at runtime.

See gh-13965 comment

It’s possible it will extend to scipy.linalg, as it also has some need to swap out backends like that, but it probably won’t be a widely used pattern across all of scipy.

cc @rgommers @peterbell10 @grlee77 @IvanYashchuk

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:6 (6 by maintainers)

Top GitHub Comments

2reactions

AnirudhDagarcommented, Jul 9, 2021

Sure, that would be great @czgdp1807! Please confirm the same with @peterbell10 or @rgommers before you start.

1reaction

rgommerscommented, Feb 23, 2022

I see a lot of open PRs to use uarray in modules (both in scipy and cupy). Could you explain what the current plan is for moving forward?

Improving the uarray docs so it’s clearer what each uarray feature is for and what the cost of not using it is, so it’s easier to decide on those things,
Extract code examples for dispatching on specific functions from PRs (e.g., dev docs examples near the top of the gh-14356 diff),
Trying to make the various thread across Discourse, SciPy, scikit-image and scikit-learn converge.
Merge support once we’re collectively happy.

The best way to find real-world issues, as well as judge things like code complexity, is to write code. Modules like ndimage and linalg are significantly more complex than fft, so it turns up cases of interest that we need to take into account. Backends are not that difficult to write once you wrap your head around them, so having working backends for the most important SciPy modules is very useful. Which is why we defined internships that included this, first for @AnirudhDagar and now for @Smit-create.