question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

A catch-all protocol for numpy-like duck arrays

See original GitHub issue

There are several functions for which I would like to see protocols constructed. I’ve raised issues for #11074 and #11128 but these are just special cases of a much larger issue that includes many operations. The sense I’ve gotten is that the process to change numpy takes a while, so I’m inclined to find a catch-all solution that can serve as a catch-all while things evolve.

To that end I propose that duck-arrays include a method that returns a module that mimics the numpy namespace

class ndarray:
    def __array_module__(self):
        import numpy as np
        return np    

class DaskArray:
    def __array_module__(self):
        import dask.array as da
        return da
        
class CuPyArray:
    def __array_module__(self):
        import cupy as cp
        return cp

class SparseArray:
    def __array_module__(self):
        import sparse
        return sparse
...

Then, in various functions like stack or concatenate we check for these modules

def stack(args, **kwargs):
    modules = {arg.__array_module__() for arg in args}
    if len(modules) == 1:
        module = list(modules)[0]
        if module != numpy:
            return module.stack(args, **kwargs)
    ...

There are likely several things wrong the implementation above, but my hope is that it gets a general point across that we’ll dispatch wholesale to the module of the provided duck arrays.

cc @shoyer @hameerabbasi @njsmith @ericmjl

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
shoyercommented, May 22, 2018

My main concern with this approach is that top level functions should be raising TypeError rather than returning NotImplemented.

For example, consider Python arithmetic (on which __array_ufunc__ was modeled) between two custom types that implement that implement the appropriate special methods (__add__ and __radd__), but that don’t know about each other:

def _not_implemented(*args, **kwargs):
  return NotImplemented

class A:
  __add__ = __radd__ = _not_implemented

class B:
  __add__ = __radd__ = _not_implemented

a = A()
b = B()
a.__add__(b)  # NotImplemented
a.__radd__(b)  # NotImplemented
a + b  # TypeError: unsupported operand type(s) for +: 'A' and 'B'

However, I do like the idea of a generic method for NumPy functions that aren’t ufuncs. I would still make this a method on array objects, though, e.g., __array_function__. NumPy’s implementation of func would call arg.__array_function__(func, *args, **kwargs) in turn on each array argument to a function, and return the first result that is not NotImplemented.

In most cases, you could write something like the following:

import dask.array as da

class DaskArray:
    def __array_function__(self, func, *args, **kwargs):
        if (not hasattr(da, func.__name__) or
                not all(isinstance(arg, HANDLED_TYPES) for arg in args)):
            return NotImplemented
        return getattr(da, func.__name__)(*args, **kwargs)
1reaction
hameerabbasicommented, May 21, 2018

My hope would be that this would be a placeholder for the common case while more specific protocols, like __array_concatenate__ evolve to support other situations, like where there are arrays of different types.

Why restrict this module protocol to certain types at all? We can follow the same algorithm as, for example, __array_ufunc__. Here’s some example code:

(Apologies for the long post)

sandbox.py

def variable_dispatch(name, args, **kwargs):
    for arg in args:
        if hasattr(arg, '__array_module__'):
            module = arg.__array_module__()

            if hasattr(module, name):
                retval = getattr(module, name)(args, **kwargs)

                if retval is not NotImplemented:
                    return retval

    raise TypeError('This operation is not possible with the supplied types.')


def dispatch(name, *args, **kwargs):
    for arg in args:
        if hasattr(arg, '__array_module__'):
            module = arg.__array_module__()

            if hasattr(module, name):
                retval = getattr(module, name)(*args, **kwargs)

                if retval is not NotImplemented:
                    return retval

    raise TypeError('This operation is not possible with the supplied types.')


def where(*args, **kwargs):
    return dispatch('where', *args, **kwargs)


def stack(args, **kwargs):
    return variable_dispatch('stack', args, **kwargs)
In[2]: import numpy as np
In[3]: import sparse
In[4]: import sandbox
In[5]: class PotatoArray(sparse.COO):
  ...:     def __array_module__(self):
  ...:         return sparse
  ...:     
In[6]: x = PotatoArray(np.eye(5))
In[7]: y = PotatoArray(np.zeros((5, 5)))
In[8]: condition = PotatoArray(np.ones((5, 5), dtype=np.bool_))
In[9]: result = sandbox.where(condition, x, y)
In[10]: sandbox.where(condition, x, y)
Out[10]: <COO: shape=(5, 5), dtype=float64, nnz=5>
In[11]: sandbox.where(condition, x, y).todense()
Out[11]: 
array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])
In[12]: sandbox.stack([x, y], axis=0)
Out[12]: <COO: shape=(2, 5, 5), dtype=float64, nnz=5>
In[13]: sandbox.stack([x, y], axis=0).todense()
Out[13]: 
array([[[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]],

       [[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]]])
In[14]: class A():
   ...:     pass
   ...: 
In[15]: sandbox.where(A(), x, y)
Traceback (most recent call last):
  File "/anaconda3/envs/sparse/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-15-775f69206c32>", line 1, in <module>
    sandbox.where(A(), x, y)
  File "/Users/hameerabbasi/PycharmProjects/sparse/sandbox.py", line 30, in where
    return dispatch('where', *args, **kwargs)
  File "/Users/hameerabbasi/PycharmProjects/sparse/sandbox.py", line 26, in dispatch
    raise TypeError('This operation is not possible with the supplied types.')
TypeError: This operation is not possible with the supplied types.
Read more comments on GitHub >

github_iconTop Results From Across the Web

NEP 30 — Duck typing for NumPy arrays - Implementation
We propose the __duckarray__ protocol, following the high-level ... This introduces a new requirement, returning the NumPy-like array itself ...
Read more >
Mailman 3 - NumPy-Discussion - python.org
Hello, Some of you might know that I've been working on a PEP in order to improve pickling performance of large (or huge)...
Read more >
目录 - Gitee
exception handling, catch all and rewrite with links to explanatory pages should we do a llc, IR -> C tool ... #2979: Support...
Read more >
JPype Documentation
For multidimensional arrays JPype uses Java style access with a series of index operations such as jarray[4][2] rather than NumPy like ...
Read more >
Effective Computation in Physics
While the command line may conjure images of The Matrix, ... tions, and protocols for how different pieces of software may inter‐.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found