How to expose API to downstream libraries?
See original GitHub issueI wanted to open a discussion on how the Array API (and potentially the dataframe API) will be exposed to downstream libraries.
For example, let’s say I am the author of scikit-learn. How do I get access to an “Array compatible API”? Or let’s say I am a downstream user, using scikit-learn in a notebook. How can I tell it to use Tensorflow over NumPy?
Options
I present three options here, but I would appreciate any suggestions on further ideas:
Manual
The default option is the current status quo where there is no standard way to get access to some array conformant API backend.
Different downstream libraries, like scikit-learn, could introduce their own mechanisms, like a backend
kwarg to functions, if they wanted to support different backends.
Local Dispatch
Another approach, would be to provide access to the related module from particular instances of the objects, which is the one taken by NEP 37.
In this case, scikit-learn would either call some x.__array_module__()
method on its inputs or we would provide a array-api
Python package that would have a helper function like get_array_module(x)
, similar to the NEP.
There is an open PR in scikit-learn (https://github.com/scikit-learn/scikit-learn/pull/16574) to add support for NEP 37.
Global Dispatch
Instead of requiring an object to inspect, we could instead rely on a global context to store the “active array api” and provide ways of getting and settings this. Some form of this is implemented by scipy, with their scipy.fft.set_backend
, which uses uarray
.
This would be heavier weight than we would need, probably, but does illustrate the general concept. I think if we implemented this, we could use Context Variables like python’s built in decimal
module does. i.e. something like this:
from array_api import set_backend, get_backend
import cupy
with set_backend(cupy):
some_fn()
def some_fn():
np = get_backend()
return np.arange(10)
The advantage of using a global dispatch is then you don’t need to rely on passing in some custom instance class to set the backend.
Static Typing
This is slightly tangential, but one question that comes up for me is how we could properly statically type options 2 or 3. It seems like what we need is a typing.Protocol
but for modules. I raised this as a discussion point on the typing-sig
mailing list.
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (12 by maintainers)
Top GitHub Comments
The answer we arrived on here is that even if there are multiple array types involved, those should come from the same library - in which case there is no dispatch problem.
I think we can close this?
This is indeed appealing in its simplicity, and would suffice for many use cases, i.e., code that only uses one array type.
It doesn’t solve the bigger “multiple library dispatch” problem, but for many projects that isn’t so important. Multi-library dispatch could perhaps be added separately with another protocol that determines which array takes priority, and which perhaps could get reused for Python binary arithmetic.