Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC: add data type inspection utilities to the array API specification

See original GitHub issue

This RFC proposes adding data type inspection utilities to the array API specification.

Overview

Currently, the array API specification requires that conforming implementations provide a specified set of data type objects (see https://data-apis.org/array-api/2021.12/API_specification/data_types.html) and casting functions (see https://data-apis.org/array-api/2021.12/API_specification/data_type_functions.html).

However, the specification does not include APIs for array data type inspection (e.g., an API for determining whether an array has a complex number data type or a floating-point data type, etc).

Prior Art

NumPy and its derivatives have dtype objects with extensive properties, including a kind property, which returns a character code indicating the general “kind” of data. For example, for relevant dtypes in the specification, NumPy uses the following character codes:

b: boolean
i: signed integer
u: unsigned integer
f: floating-point (real-valued)
c: complex floating-point

In [1]: np.zeros((3,4)).dtype.kind
Out[1]: 'f'

This availability of the kind property is useful when wanting to branch based on input array data types (e.g., applying summation algorithms).

if x.dtype.kind == 'f':
    # do one thing
else:
   # do another thing

In PyTorch, dtype objects have is_complex and is_floating_point properties for checking a data type “kind”.

Additionally, PyTorch offers functional APIs is_complex and is_floating_point providing equivalent behavior.

Proposal

Given the proposal for adding complex number support to the specification (see https://github.com/data-apis/array-api/issues/373 and https://github.com/data-apis/array-api/pull/418), a greater need arises for the specification to require conforming implementations to provide standardized ways for data type inspection.

For example, conforming implementations will need to branch in abs(x) depending on whether x is real-valued or complex-valued. Similarly, in downstream user code, we can expect that users will inevitably encounter situations where they need to branch based on input array data types (e.g., when choosing summation algorithms).

As this specification has favored functional APIs, this RFC follows suit and proposes adding the following APIs to the specification:

has_complex_float_dtype(x: Union[array, dtype]) -> bool

Returns a bool indicating whether an input array has a complex number data type (e.g., complex64 or complex128).

has_real_float_dtype(x: Union[array, dtype]) -> bool

Returns a bool indicating whether an input array has a (real-valued) floating-point number data type (e.g., float32 or float64).

has_float_dtype(x: Union[array, dtype]) -> bool

Returns a bool indicating whether an input array has a complex or real-valued floating-point number data type (e.g., float32, float64, complex64, or complex128).

has_unsigned_int_dtype(x: Union[array, dtype]) -> bool

Returns a bool indicating whether an input array has an unsigned integer data type (e.g., uint8, uint16, uint32, uint64).

has_signed_int_dtype(x: Union[array, dtype]) -> bool

Returns a bool indicating whether an input array has a signed integer data type (e.g., int8, int16, int32, int64).

has_int_dtype(x: Union[array, dtype]) -> bool

Returns a bool indicating whether an input array has an integer (signed or unsigned) data type.

has_real_dtype(x: Union[array, dtype]) -> bool

Returns a bool indicating whether an input array has a real-valued (integer or floating-point) data type.

has_bool_dtype(x: Union[array, dtype]) -> bool

Returns a bool indicating whether an input array has a boolean data type.

The above APIs cover the list of data types currently described in the specification, are sufficiently specific to cover most use cases, and can be composed to address most anticipated data type set combinations.

Issue Analytics

State:
Created a year ago
Reactions:2
Comments:33 (26 by maintainers)

Top GitHub Comments

2reactions

rgommerscommented, Oct 7, 2022

We had another look at this yesterday. We want to go for a flavor of the function-based implementation here; there was no clear preference for which of the above was preferred. So let’s try a vote - use emoji’s on this comment:

👍🏼 if you prefer is_dtype(x: dtype, kind: str)
🎉 if you prefer is_dtype(x: dtype, kind: str | dtype)
🚀 if you prefer is_dtype(x: dtype, kind: str | dtype | tuple[Union[str, dtype], ...])

2reactions

jbrockmendelcommented, Sep 8, 2022

I will note that I am slightly unsure about the Union[array, dtype]

I mentioned this in a call a few weeks back: the pandas is_foo_dtype functions accept both and that is a design choice we have largely come to regret. The performance overhead of checking for a .dtype attribute adds up quickly.

Top Results From Across the Web

RFC: Adding complex number support to the specification · Issue ...

Complex Number Support Plan for complex number support in the array API ... RFC: add data type inspection utilities to the array API...

OpenAPI Specification

The Swagger specification defines a set of files required to describe such an API. These files can then be used by the Swagger-UI...

RFC 59.1 : GDAL/OGR utilities as a library

This RFC defines how to expose GDAL/OGR C/C++ utilities as C callable functions. ... Opaque type */ typedef struct GDALInfoOptions GDALInfoOptions; ...

Swagger RESTful API Documentation Specification

4.3.4 Items Object. This object is used to describe the value types used inside an array. Of the Data Type Fields, it can...

Working with Collections | Defining and Using Classes

To define list or array properties, use these property definition syntaxes: ... Here, MyProp is the property name, and Type is either a...