RFC: add data type inspection utilities to the array API specification
See original GitHub issueThis RFC proposes adding data type inspection utilities to the array API specification.
Overview
Currently, the array API specification requires that conforming implementations provide a specified set of data type objects (see https://data-apis.org/array-api/2021.12/API_specification/data_types.html) and casting functions (see https://data-apis.org/array-api/2021.12/API_specification/data_type_functions.html).
However, the specification does not include APIs for array data type inspection (e.g., an API for determining whether an array has a complex number data type or a floating-point data type, etc).
Prior Art
NumPy and its derivatives have dtype
objects with extensive properties, including a kind
property, which returns a character code indicating the general “kind” of data. For example, for relevant dtypes in the specification, NumPy uses the following character codes:
b
: booleani
: signed integeru
: unsigned integerf
: floating-point (real-valued)c
: complex floating-point
In [1]: np.zeros((3,4)).dtype.kind
Out[1]: 'f'
This availability of the kind
property is useful when wanting to branch based on input array data types (e.g., applying summation algorithms).
if x.dtype.kind == 'f':
# do one thing
else:
# do another thing
In PyTorch, dtype
objects have is_complex
and is_floating_point
properties for checking a data type “kind”.
Additionally, PyTorch offers functional APIs is_complex
and is_floating_point
providing equivalent behavior.
Proposal
Given the proposal for adding complex number support to the specification (see https://github.com/data-apis/array-api/issues/373 and https://github.com/data-apis/array-api/pull/418), a greater need arises for the specification to require conforming implementations to provide standardized ways for data type inspection.
For example, conforming implementations will need to branch in abs(x)
depending on whether x
is real-valued or complex-valued. Similarly, in downstream user code, we can expect that users will inevitably encounter situations where they need to branch based on input array data types (e.g., when choosing summation algorithms).
As this specification has favored functional APIs, this RFC follows suit and proposes adding the following APIs to the specification:
has_complex_float_dtype(x: Union[array, dtype]) -> bool
Returns a bool
indicating whether an input array has a complex number data type (e.g., complex64
or complex128
).
has_real_float_dtype(x: Union[array, dtype]) -> bool
Returns a bool
indicating whether an input array has a (real-valued) floating-point number data type (e.g., float32
or float64
).
has_float_dtype(x: Union[array, dtype]) -> bool
Returns a bool
indicating whether an input array has a complex or real-valued floating-point number data type (e.g., float32
, float64
, complex64
, or complex128
).
has_unsigned_int_dtype(x: Union[array, dtype]) -> bool
Returns a bool
indicating whether an input array has an unsigned integer data type (e.g., uint8
, uint16
, uint32
, uint64
).
has_signed_int_dtype(x: Union[array, dtype]) -> bool
Returns a bool
indicating whether an input array has a signed integer data type (e.g., int8
, int16
, int32
, int64
).
has_int_dtype(x: Union[array, dtype]) -> bool
Returns a bool
indicating whether an input array has an integer (signed or unsigned) data type.
has_real_dtype(x: Union[array, dtype]) -> bool
Returns a bool
indicating whether an input array has a real-valued (integer or floating-point) data type.
has_bool_dtype(x: Union[array, dtype]) -> bool
Returns a bool
indicating whether an input array has a boolean data type.
The above APIs cover the list of data types currently described in the specification, are sufficiently specific to cover most use cases, and can be composed to address most anticipated data type set combinations.
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:33 (26 by maintainers)
Top GitHub Comments
We had another look at this yesterday. We want to go for a flavor of the function-based implementation here; there was no clear preference for which of the above was preferred. So let’s try a vote - use emoji’s on this comment:
is_dtype(x: dtype, kind: str)
is_dtype(x: dtype, kind: str | dtype)
is_dtype(x: dtype, kind: str | dtype | tuple[Union[str, dtype], ...])
I mentioned this in a call a few weeks back: the pandas is_foo_dtype functions accept both and that is a design choice we have largely come to regret. The performance overhead of checking for a .dtype attribute adds up quickly.