question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC: add data type inspection utilities to the array API specification

See original GitHub issue

This RFC proposes adding data type inspection utilities to the array API specification.

Overview

Currently, the array API specification requires that conforming implementations provide a specified set of data type objects (see https://data-apis.org/array-api/2021.12/API_specification/data_types.html) and casting functions (see https://data-apis.org/array-api/2021.12/API_specification/data_type_functions.html).

However, the specification does not include APIs for array data type inspection (e.g., an API for determining whether an array has a complex number data type or a floating-point data type, etc).

Prior Art

NumPy and its derivatives have dtype objects with extensive properties, including a kind property, which returns a character code indicating the general “kind” of data. For example, for relevant dtypes in the specification, NumPy uses the following character codes:

  • b: boolean
  • i: signed integer
  • u: unsigned integer
  • f: floating-point (real-valued)
  • c: complex floating-point
In [1]: np.zeros((3,4)).dtype.kind
Out[1]: 'f'

This availability of the kind property is useful when wanting to branch based on input array data types (e.g., applying summation algorithms).

if x.dtype.kind == 'f':
    # do one thing
else:
   # do another thing

In PyTorch, dtype objects have is_complex and is_floating_point properties for checking a data type “kind”.

Additionally, PyTorch offers functional APIs is_complex and is_floating_point providing equivalent behavior.

Proposal

Given the proposal for adding complex number support to the specification (see https://github.com/data-apis/array-api/issues/373 and https://github.com/data-apis/array-api/pull/418), a greater need arises for the specification to require conforming implementations to provide standardized ways for data type inspection.

For example, conforming implementations will need to branch in abs(x) depending on whether x is real-valued or complex-valued. Similarly, in downstream user code, we can expect that users will inevitably encounter situations where they need to branch based on input array data types (e.g., when choosing summation algorithms).

As this specification has favored functional APIs, this RFC follows suit and proposes adding the following APIs to the specification:

has_complex_float_dtype(x: Union[array, dtype]) -> bool

Returns a bool indicating whether an input array has a complex number data type (e.g., complex64 or complex128).

has_real_float_dtype(x: Union[array, dtype]) -> bool

Returns a bool indicating whether an input array has a (real-valued) floating-point number data type (e.g., float32 or float64).

has_float_dtype(x: Union[array, dtype]) -> bool

Returns a bool indicating whether an input array has a complex or real-valued floating-point number data type (e.g., float32, float64, complex64, or complex128).

has_unsigned_int_dtype(x: Union[array, dtype]) -> bool

Returns a bool indicating whether an input array has an unsigned integer data type (e.g., uint8, uint16, uint32, uint64).

has_signed_int_dtype(x: Union[array, dtype]) -> bool

Returns a bool indicating whether an input array has a signed integer data type (e.g., int8, int16, int32, int64).

has_int_dtype(x: Union[array, dtype]) -> bool

Returns a bool indicating whether an input array has an integer (signed or unsigned) data type.

has_real_dtype(x: Union[array, dtype]) -> bool

Returns a bool indicating whether an input array has a real-valued (integer or floating-point) data type.

has_bool_dtype(x: Union[array, dtype]) -> bool

Returns a bool indicating whether an input array has a boolean data type.


The above APIs cover the list of data types currently described in the specification, are sufficiently specific to cover most use cases, and can be composed to address most anticipated data type set combinations.

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:2
  • Comments:33 (26 by maintainers)

github_iconTop GitHub Comments

2reactions
rgommerscommented, Oct 7, 2022

We had another look at this yesterday. We want to go for a flavor of the function-based implementation here; there was no clear preference for which of the above was preferred. So let’s try a vote - use emoji’s on this comment:

  • 👍🏼 if you prefer is_dtype(x: dtype, kind: str)
  • 🎉 if you prefer is_dtype(x: dtype, kind: str | dtype)
  • 🚀 if you prefer is_dtype(x: dtype, kind: str | dtype | tuple[Union[str, dtype], ...])
2reactions
jbrockmendelcommented, Sep 8, 2022

I will note that I am slightly unsure about the Union[array, dtype]

I mentioned this in a call a few weeks back: the pandas is_foo_dtype functions accept both and that is a design choice we have largely come to regret. The performance overhead of checking for a .dtype attribute adds up quickly.

Read more comments on GitHub >

github_iconTop Results From Across the Web

RFC: Adding complex number support to the specification · Issue ...
Complex Number Support Plan for complex number support in the array API ... RFC: add data type inspection utilities to the array API...
Read more >
OpenAPI Specification
The Swagger specification defines a set of files required to describe such an API. These files can then be used by the Swagger-UI...
Read more >
RFC 59.1 : GDAL/OGR utilities as a library
This RFC defines how to expose GDAL/OGR C/C++ utilities as C callable functions. ... Opaque type */ typedef struct GDALInfoOptions GDALInfoOptions; ...
Read more >
Swagger RESTful API Documentation Specification
4.3.4 Items Object. This object is used to describe the value types used inside an array. Of the Data Type Fields, it can...
Read more >
Working with Collections | Defining and Using Classes
To define list or array properties, use these property definition syntaxes: ... Here, MyProp is the property name, and Type is either a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found