question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Protocol for array objects

See original GitHub issue

I’ve tried to tackle static typing and got a vendorable protocol that can be checked statically as well as at runtime for all but one case I’m going to detail below.

Protocol

import enum
from typing import Any, Optional, Protocol, Tuple, TypeVar, Union, runtime_checkable

A = TypeVar("A")


@runtime_checkable
class VendoredArrayProtocol(Protocol[A]):
    @property
    def dtype(self) -> Any:
        ...

    @property
    def device(self) -> Any:
        ...

    @property
    def ndim(self) -> int:
        ...

    @property
    def shape(self) -> Any:
        ...

    @property
    def size(self) -> int:
        ...

    @property
    def T(self) -> A:
        ...

    def __abs__(self) -> A:
        ...

    def __add__(self, other: Union[int, float, A], /) -> A:
        ...

    def __and__(self, other: Union[bool, int, A], /) -> A:
        ...

    def __array_namespace__(self, /, *, api_version: Optional[str] = None) -> Any:
        ...

    def __bool__(self) -> bool:
        ...

    def __dlpack__(self, /, *, stream: Optional[Union[int, Any]] = None) -> Any:
        ...

    def __dlpack_device__(self) -> Tuple[enum.IntEnum, int]:
        ...

    # This overrides the input type, since object.__eq__ handles any input
    # This overrides the return type, since object.__eq__ returns a bool
    def __eq__(  # type: ignore[override]
        self,
        other: Union[bool, int, float, A],
        /,
    ) -> A:  # type: ignore[override]
        ...

    def __float__(self) -> float:
        ...

    def __floordiv__(self, other: Union[int, float, A], /) -> A:
        ...

    def __ge__(self, other: Union[int, float, A], /) -> A:
        ...

    def __getitem__(
        self,
        key: Union[int, slice, Tuple[Union[int, slice], ...], A],
        /,
    ) -> A:
        ...

    def __gt__(self, other: Union[int, float, A], /) -> A:
        ...

    def __int__(self) -> int:
        ...

    def __invert__(self) -> A:
        ...

    def __le__(self, other: Union[int, float, A], /) -> A:
        ...

    def __len__(self) -> int:
        ...

    def __lshift__(self, other: Union[int, A], /) -> A:
        ...

    def __lt__(self, other: Union[int, float, A], /) -> A:
        ...

    def __matmul__(self, other: A) -> A:
        ...

    def __mod__(self, other: Union[int, float, A], /) -> A:
        ...

    def __mul__(self, other: Union[int, float, A], /) -> A:
        ...

    # This overrides the input type, since object.__ne__ handles any input
    # This overrides the return type, since object.__ne__ returns a bool
    def __ne__(  # type: ignore[override]
        self, other: Union[bool, int, float, A], /
    ) -> A:  # type: ignore[override]
        ...

    def __neg__(self) -> A:
        ...

    def __or__(self, other: Union[bool, int, A], /) -> A:
        ...

    def __pos__(self) -> A:
        ...

    def __pow__(self, other: Union[int, float, A], /) -> A:
        ...

    def __rshift__(self, other: Union[int, A], /) -> A:
        ...

    def __setitem__(
        self,
        key: Union[int, slice, Tuple[Union[int, slice], ...], A],
        value: Union[bool, int, float, A],
        /,
    ) -> None:
        ...

    def __sub__(self, other: Union[int, float, A], /) -> A:
        ...

    def __truediv__(self, other: Union[int, float, A], /) -> A:
        ...

    def __xor__(self, other: Union[bool, int, A], /) -> A:
        ...

To test everything yourself you can use this playground repo.

Current blocker

It is currently impossible to use Ellipsis in type annotations, since its alias ... has a different meaning there. Thus, it is currently impossible to correctly annotate the __getitem__ and __setitem__ methods. There is a fix for this in python/cpython/#22336, but it will only be shipped with Python 3.10. If we leave it out of the annotation, accessing the array with something like Array()[..., 0] will be flagged by mypy although it should be supported according to the specification.

Suggestes improvements

While working on the protocol I found a few issues that could be addressed:

  • Array.dtype, Array.device, Array.__array_namespace__(), and Array.__dlpack__() should return custom objects, but it is not specified how these objects “look like”. In the current state of the protocol I’ve typed them as Any, but the specification should be more precise.

  • Array.shape should return Tuple[int, ...], but https://github.com/data-apis/array-api-tests/pull/15#issuecomment-858591464 implies that custom objects might also be possible. Maybe we can use Sequence[int]?

  • The type annotation of the stream parameter from Array.__dlpack__() reads Optional[Union[int, Any]] which is equivalent to Any but more concise.

  • The binary dunder methods take a specific input types for the other parameter. For example __add__ takes Union[int, float, Array]. IMO they should take Any and return NotImplemented in case they cannot work with the type. For example:

    class Array:
        def __add__(self, other: Any, /) -> "Array":
            if not isinstance(other, (int, float, Array)):
                return NotImplemented
    
            # perform addition
    

    This makes it harder for static type checkers to catch bugs, because statically something like Array() + None would be allowed, but it gives the other object a chance to work with the Array object by implementing the reflected dunder (here __radd__). If both objects do not know how to deal with the addition, Python will automatically raise a TypeError.

    Since the object class defines a __eq__ and __neq__ method according to the proposed scheme above, I needed to put # type: ignore[override] directives in the protocol for the input types.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:12 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
rgommerscommented, Jul 29, 2021

We had a discussion about this today. A standalone package may indeed be nice, and then we also have a place where other utilities or tests can be added (this has come up several times before, and we always tried to avoid a package because that’s work to maintain - but here it seems like we can’t avoid that anyway).

In terms of vendoring, it may be possible to avoid that by (in an array library) doing:

try:
   from array_pkgname import ArrayProtocol
except ImportError:
    # this is an optional dependency. static type checking against the Protocol
    # won't work for downstream users if the package is not installed
    class ArrayProtocol():
        pass


class ndarray(ArrayProtocol):
     # array object in this library (could also be named Array, Tensor, etc.)
    ...

A number of libraries don’t support static typing at all yet (TensorFlow, CuPy), while some others (NumPy, PyTorch, MXNet) do. Then the only libraries that may want to vendor the protocol are the ones that want to use it in their own test suite.

1reaction
pmeiercommented, Jul 28, 2021

@rgommers

can you summarize the issue with if every array library vendors this protocol, instead of putting it in a common package and inheriting from it? That still has a limitation, right?

Not sure what you mean here. If this is about my earlier concern (that I only voiced offline) that typing.TypeVar only works for directly related classes, this is not blocking anymore. Although the documentation states

Alternatively, a type variable may specify an upper bound using bound=<type>. This means that an actual type substituted (explicitly or implicitly) for the type variable must be a subclass of the boundary type [emphasis mine]

it seems to work out fine with the vendored protocol, which by definition cannot be related to an actual array implementation. If you have a look at the playgorund repo and specifically this file, mypy is happy although the ArrayImplementation class is independent of VendoredArrayProtocol.

Is there a problem with using Tuple[int, ...]?

Yes, see https://github.com/data-apis/array-api-tests/pull/15#issuecomment-858591464. But this is not solved by variadic generics either IIUC.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Swift array of objects conforms to a protocol - Stack Overflow
I'm very new to Swift/iOS programming, but something like this works for me. func fetchDepRefs() -> Array<Datable> {return Array<Datable>()} ...
Read more >
Define Array of protocol which conforms to Identifiable
My goal is to have different structs and classes conforming to Playable (and hence to Identifiable) able to live inside a Collection. Swift....
Read more >
returning an array of Protocol-conforming objects from a function
I have a function which returns a closure, which then returns an array of objects conforming to a protocol. ie. protocol MyProtocol {}....
Read more >
The array interface protocol — NumPy v1.24 Manual
The array interface (sometimes called array protocol) was created in 2005 as a means for array-like Python objects to re-use each other's data...
Read more >
Iteration protocols - JavaScript - MDN Web Docs
The iterable protocol allows JavaScript objects to define or customize their iteration behavior, such as what values are looped over in a for......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found