Related topic: NumPy array protocols
See original GitHub issueThis issue is meant to summarize the current status and likely future direction of the NumPy array protocols, and their relevance to the array API standard.
What are these array protocols?
In summary, they are dispatching mechanisms that allow calling the public NumPy API with other numpy.ndarray
-like arrays (e.g. CuPy or Dask arrays, or any other array that implements the protocols) and have the function call dispatch to that library. There are two protocols, __array_ufunc__
and __array_function__
, that are very similar - the difference is that with __array_ufunc__
the library being dispatched to knows it’s getting a ufunc and it can therefore make use of some properties all ufuncs have. The dispatching works the same for both protocols though.
Why were they created?
__array_ufunc__
was created first, the original driver was to be able to call numpy ufuncs on scipy.sparse
matrices. __array_function__
was created later, to be able to cover most of the NumPy API (every function that takes an array as input) and use the NumPy API with other array/tensor implementations:
What is the current status?
The protocols have been adopted by:
- CuPy
- Dask
- Xarray
- MXNet
- PyData Sparse
- Pint
They have not (or not yet) been adopted by:
- Tensorflow (because no compatible API to dispatch to, interest of maintainers unclear)
- PyTorch (because no compatible API to dispatch to, maintainers do have interest)
- JAX (concerns about added value and backwards compatibility - see NEP 37 introduction)
scipy.sparse
(semantics not compatible)
The RAPIDS ecosystem, which builds on Dask and CuPy, has been particularly happy with these protocols, and use them heavily. There they’ve also run into some of the limitations, the most painful one being that array creation functions cannot be dispatched on.
What is likely to change in the near future?
There is still active exploration of new ideas and design alternatives (or additions to) the array protocols. There’s 3 main “contenders”:
- extend the protocols to cover the most painful shortcomings: NEP 30 (
__duckarray__
) + NEP 35 (like=
). - use a separate module namespace: NEP 37 (
__array_module__
) - use a multiple dispatch library: NEP 31 (
unumpy
)
At the moment, the most likely outcome is doing both (1) and (2). It needs prototyping and testing though - any solution should only be accepted when it’s clear that it not only solves the immediate pain points RAPIDS ran into, but also that libraries like scikit-learn and SciPy can then adopt it.
What is the relationship of the array protocols with an API standard?
There’s several connections:
- The original idea of
__array_function__
(figure above) doesn’t require an API that’s the same as the NumPy one, but in practice the protocols can only be adopted when there’s an API with matching signatures and semantics. - The lack of an API standard has meant that it’s hard to predict what NumPy functions will work for another array library that implements the protocols.
- The separate namespaces (
__array_module__
,unumpy
) provide a good opportunity to introduce a new API standard once that’s agreed on.
References
- NEP 13 - A Mechanism for Overriding Ufuncs
- NEP 18 - A dispatch mechanism for NumPy’s high level array functions
- NEP 22 - Duck typing for NumPy arrays – high level overview
- NEP 30 - https://numpy.org/neps/nep-0030-duck-array-protocol.html
- NEP 31 - Context-local and global overrides of the NumPy API
- NEP 35 - Array Creation Dispatching With
__array_function__
- NEP 37 - A dispatch protocol for NumPy-like modules
- Meeting minutes of a recent conversation on NEPs 30, 31, 35 and 37
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:19 (9 by maintainers)
Top GitHub Comments
The
scipy-dev
mailing list will be the best place. But I understand subscribing to the mailing list is not ideal. I’ll try to remember to add an update to this repo as well.That sounds cool. For that topic I’d suggest opening a tracking issue on the SciPy repo.
Not about a standard as such. I think to consider that, we first need to show the array API standard being adopted and used in SciPy and other downstream libraries. It’s also a much more fuzzy question - it may make sense for some parts of SciPy but not for the package as a whole.
For
fft
andlinalg
, they should be supersets of https://data-apis.org/array-api/latest/extensions/linear_algebra_functions.html andnumpy.fft
(which is also being added as an extension module at https://github.com/data-apis/array-api/pull/189).For the most widely used functions in
special
it could make sense, and maybendimage
and parts ofoptimize
andinterpolate
. For everything else I think it’s a bridge too far, at least in the near future.I can’t disagree with that:)
It would be nice to have a bit of coordination there perhaps. For now I think other libraries are copying the parts they need, which is fine I guess - but I’d rather not see functionality with poor APIs or questionable algorithms copied (looking at you,
cusignal
…).I expect there to be some follow-up yes within the next months. We basically have all the pieces of the puzzle to start connecting things together - see, e.g., https://github.com/scipy/scipy/issues/10204 for how to connect
scipy.ndimage
,scikit-image
,dask
andcupy
(there is some funding for that work, through the CZI scikit-image grant).My feeling is that scikit-learn is in a much better place, because it has done an exceptional job with careful API design and creating compatible APIs in other packages can be and is being done. Maybe @amueller can say more about this.