question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Related topic: NumPy array protocols

See original GitHub issue

This issue is meant to summarize the current status and likely future direction of the NumPy array protocols, and their relevance to the array API standard.

What are these array protocols?

In summary, they are dispatching mechanisms that allow calling the public NumPy API with other numpy.ndarray-like arrays (e.g. CuPy or Dask arrays, or any other array that implements the protocols) and have the function call dispatch to that library. There are two protocols, __array_ufunc__ and __array_function__, that are very similar - the difference is that with __array_ufunc__ the library being dispatched to knows it’s getting a ufunc and it can therefore make use of some properties all ufuncs have. The dispatching works the same for both protocols though.

Why were they created?

__array_ufunc__ was created first, the original driver was to be able to call numpy ufuncs on scipy.sparse matrices. __array_function__ was created later, to be able to cover most of the NumPy API (every function that takes an array as input) and use the NumPy API with other array/tensor implementations:

image

What is the current status?

The protocols have been adopted by:

  • CuPy
  • Dask
  • Xarray
  • MXNet
  • PyData Sparse
  • Pint

They have not (or not yet) been adopted by:

  • Tensorflow (because no compatible API to dispatch to, interest of maintainers unclear)
  • PyTorch (because no compatible API to dispatch to, maintainers do have interest)
  • JAX (concerns about added value and backwards compatibility - see NEP 37 introduction)
  • scipy.sparse (semantics not compatible)

The RAPIDS ecosystem, which builds on Dask and CuPy, has been particularly happy with these protocols, and use them heavily. There they’ve also run into some of the limitations, the most painful one being that array creation functions cannot be dispatched on.

What is likely to change in the near future?

There is still active exploration of new ideas and design alternatives (or additions to) the array protocols. There’s 3 main “contenders”:

  1. extend the protocols to cover the most painful shortcomings: NEP 30 (__duckarray__) + NEP 35 (like=).
  2. use a separate module namespace: NEP 37 (__array_module__)
  3. use a multiple dispatch library: NEP 31 (unumpy)

At the moment, the most likely outcome is doing both (1) and (2). It needs prototyping and testing though - any solution should only be accepted when it’s clear that it not only solves the immediate pain points RAPIDS ran into, but also that libraries like scikit-learn and SciPy can then adopt it.

What is the relationship of the array protocols with an API standard?

There’s several connections:

  • The original idea of __array_function__ (figure above) doesn’t require an API that’s the same as the NumPy one, but in practice the protocols can only be adopted when there’s an API with matching signatures and semantics.
  • The lack of an API standard has meant that it’s hard to predict what NumPy functions will work for another array library that implements the protocols.
  • The separate namespaces (__array_module__, unumpy) provide a good opportunity to introduce a new API standard once that’s agreed on.

References

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:19 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
rgommerscommented, Jun 16, 2021

Thanks for the information – what will be the right place to track the follow-ups on accelerating SciPy?

The scipy-dev mailing list will be the best place. But I understand subscribing to the mailing list is not ideal. I’ll try to remember to add an update to this repo as well.

FYI, I will spend my time experimenting with tvm.topi.sparse and pytaco.tensor, to build the accelerated version of scipy.sparse.linalg. For further updates and discussions, should I open an issue here or on SciPy’s GitHub issue?

That sounds cool. For that topic I’d suggest opening a tracking issue on the SciPy repo.

1reaction
rgommerscommented, Jun 15, 2021
  1. Are there any discussions & efforts in the SciPy community about designing such a new API standard (similar to NEP-47)?

Not about a standard as such. I think to consider that, we first need to show the array API standard being adopted and used in SciPy and other downstream libraries. It’s also a much more fuzzy question - it may make sense for some parts of SciPy but not for the package as a whole.

For fft and linalg, they should be supersets of https://data-apis.org/array-api/latest/extensions/linear_algebra_functions.html and numpy.fft (which is also being added as an extension module at https://github.com/data-apis/array-api/pull/189).

For the most widely used functions in special it could make sense, and maybe ndimage and parts of optimize and interpolate. For everything else I think it’s a bridge too far, at least in the near future.

SciPy has even more historical baggage than NumPy – scipy.io & scipy.misc & scipy.fftpack are rarely used now, and scipy.odr and scipy.cluster should probably belong to sklearn. There are tons of rooms for API simplification.

I can’t disagree with that:)

  1. If there are no formal discussions & efforts so far, would it be worthwhile to start one? Considering the large user base of SciPy, the need to accelerate serial solvers, and the current fragmented implementations (cupyx.scipy, jax.scipy, etc.).

It would be nice to have a bit of coordination there perhaps. For now I think other libraries are copying the parts they need, which is fine I guess - but I’d rather not see functionality with poor APIs or questionable algorithms copied (looking at you, cusignal …).

I saw Support for distributed arrays and GPU arrays in SciPy Roadmap, and wonder if there are any follow-ups.

I expect there to be some follow-up yes within the next months. We basically have all the pieces of the puzzle to start connecting things together - see, e.g., https://github.com/scipy/scipy/issues/10204 for how to connect scipy.ndimage, scikit-image, dask and cupy (there is some funding for that work, through the CZI scikit-image grant).

Same question for accelerated sklearn. I will see if sklearn-onnx is the right solution.

My feeling is that scikit-learn is in a much better place, because it has done an exceptional job with careful API design and creating compatible APIs in other packages can be and is being done. Maybe @amueller can say more about this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

NEP 18 — A dispatch mechanism for NumPy's high level array ...
We propose the __array_function__ protocol, to allow arguments of NumPy functions to define how that function operates on them.
Read more >
Array programming with NumPy - Nature
NumPy is the primary array programming library for the Python language. It has an essential role in research analysis pipelines in fields as ......
Read more >
2.2. Advanced NumPy - Scipy Lecture Notes
Array interface protocol¶. Multidimensional buffers; Data type information present; NumPy-specific approach; slowly deprecated (but not going away); Not ...
Read more >
NumPy's API and array protocols expose new ... - ResearchGate
... facilitate this interoperability, NumPy provides 'protocols' (or contracts of operation), that allow for specialized arrays to be passed to NumPy functions ...
Read more >
A catch-all protocol for numpy-like duck arrays #11129 - GitHub
class ndarray: def __array_module__(self): import numpy as np return np class DaskArray: def __array_module__(self): import dask.array as da ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found