Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

meta-issue: Numba/NumPy interation+collaboration+discussion!

See original GitHub issue

Following on from some recent OOB discussions, this is a meta-issue to track ways in which the Numba and NumPy projects could cooperate over common interaction points. The items presented herein are with the sole purpose of attempting to highlight areas to promote discussion and collaboration. Some of the following is also likely to apply to other packages that have similar requirements to Numba (i.e. type stability). Please feel free to comment and/or add more items.

CC @rgommers @seibert

Type stability challenges in the NumPy APIs.

Background: Numba uses type inference to compute the type of all variables in user code. This is performed by combining the type information from the arguments to the user function with a lot of rules defined in Numba about what the types of the variables returned by operations and functions for given input types will be. Note that type inference is performed at compile time and depends solely on the types of the arguments involved and not on their values.

The following contains specific examples of the kinds of problems a type inferring compiler may run into when attempting to replicate the NumPy API. The NumPy API has obviously grown and evolved over the years, it makes a lot of use of Python’s dynamic nature, and predates e.g. Numba by a long way, so it is entirely understandable that these challenges exist. What will be interesting is working out if they can be overcome via e.g. more advanced type systems or type stable API updates.

Functions which change their output type depending on value.

Example::

In [12]: np.linalg.eigvals([[1, 1], [1, 1]])
Out[12]: array([2., 0.])

In [13]: np.linalg.eigvals([[1, 1],[-1, 1]])
Out[13]: array([1.+1.j, 1.-1.j])

this example is at one extreme of the issue in that there’s a numerical reason for such a behaviour, however it demonstrates that some functions change their return type (float64 vs. complex128 array type) depending on run time value. Another such example is np.cross, which is discussed here https://github.com/numba/numba/pull/4128#issuecomment-498324729, where there is a return type specialisation depending on the runtime value of an index of an array shape.

Functions which change their output type depending on values of “flags”.

Example::

In [18]: np.unique([1, 2, 1], return_index=False)
Out[18]: array([1, 2])

In [19]: np.unique([1, 2, 1], return_index=True)
Out[19]: (array([1, 2]), array([0, 1]))

The problem here is that unless the return_index flag is provided as a compile time constant value (literally True/False in the user code), it’s impossible to determine the return type at compile time.

Functions that return mutable heterogeneous containers (e.g. lists) when their size is bounded.

Example::

In [21]: np.broadcast_arrays(np.ones(2), 3)
Out[21]: [array([1., 1.]), array([3, 3])]

In [22]: np.broadcast_arrays(np.ones(2), 3, 2)
Out[22]: [array([1., 1.]), array([3, 3]), array([2, 2])]

The size of the output container is a predicable function of the number of arguments. In a type system such as Numba’s there is no support for heterogeneously typed mutable containers, however there is support for heterogeneously typed immutable containers. Given the above is predicable, could the return type reasonably be a tuple?

Functions implementations that do not match docstrings.

Example::

In [38]: print('\n'.join(np.broadcast_shapes.__doc__.splitlines()[:11]))

    Broadcast the input shapes into a single shape.

    :ref:`Learn more about broadcasting here <basics.broadcasting>`.

    .. versionadded:: 1.20.0

    Parameters
    ----------
    `*args` : tuples of ints, or ints
        The shapes to be broadcast against each other.

In [39]: np.broadcast_shapes([12,], (12,))
Out[39]: (12,)

Whilst not directly type stability… the issue is that Numba contributors end up having to “reverse” what is going on to work out what is accepted as input. It makes the support experience in Numba a little inconsistent depending on what was implemented and worked out at the time. There are also occasions where what is accepted changes between NumPy versions.

Some more general mismatches between NumPy and Numba.

There are a few areas where NumPy and Numba differ where they perhaps should not and it’d be good to discuss how to minimise the existing differences and make future ones smaller!

Naming of arguments to functions is sometimes inconsistent.

Example:

In [48]: np.linalg.matrix_power
Out[48]: <function numpy.linalg.matrix_power(a, n)>

In [49]: np.linalg.matrix_rank
Out[49]: <function numpy.linalg.matrix_rank(M, tol=None, hermitian=False)>

these two functions are in the same module and both take a matrix or stack of matrices as the first argument. However one implementation uses a as the argument name and the other M. This doesn’t matter hugely but does present a bit of a general inconsistency. This impacts Numba users now and again when they write code which binds to variable names at the call site, if Numba hasn’t used the same names as NumPy then the code won’t compile. Numba really ought to fix this in its code base, but it might be something NumPy should consider too?

e.g.

In [52]: from numba import njit

In [53]: @njit
    ...: def foo(x):
    ...:     np.linalg.matrix_rank(M=x)
    ...:

In [54]: try:
    ...:     foo(np.ones((2, 2)))
    ...: except Exception as e:
    ...:     print(e)
    ...:
Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function matrix_rank at 0x7ffa1b198430>) found for signature:

 >>> matrix_rank(M=array(float64, 2d, C))

There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload in function 'matrix_rank_impl': File: numba/np/linalg.py: Line 2419.
    With argument(s): '(M=array(float64, 2d, C))':
   Rejected as the implementation raised a specific error:
     TypingError: missing a required argument: 'a'

Numba erroneously uses a as its argument name, NumPy uses M.

Mismatch in the use of BLAS/LAPACK precision.

The majority of the NumPy linalg routines all upcast float32 and complex64 types to their 64 bit equivalents and then do the work at this precision before casting the answer back to the input type. This creates problems for Numba in that Numba type specialises the BLAS/LAPACK calls based on the input types, which results in a mismatch compared to NumPy’s result. This is also why Numba has better performance NumPy in this area when it probably should not. It also causes a degree of issue in Numba’s test suite in that the amount that the results differ is sufficiently large to have to write specialised reconstruction routines to check results (opposed to the more usual np.testing.assert_allclose equivalent).

Inf/NaN handling challenges.

Numba has to reimplement NumPy functionality using a what-Numba-supports subset of what is available in NumPy. As a result translating algorithms can be challenging, particularly with regards to Inf/NaN production as it is not always clear whether such an output was intentional or a side effect of the implementation. This often appears as an issue in the case of supporting a new version of NumPy in which some algorithm has changed and Inf/NaN production has changed as a result. An example might be the NumPy 1.19 upgrade https://github.com/numba/numba/pull/7019 where np.percentile and np.quantile changed production of inf/nan (and also dropped boolean support!). It might be beneficial to specify the expected Inf/NaN handling behaviour of functions cf. standards such as C99 or POSIX.

Other areas of interaction.

Exposing NumPy’s internals as ctypes pointers.

Relatively recently NumPy added BitGenerator/Generator to the np.random module. Part of the implementation involved exposing the underlying C routines, that produce the bit streams and the “next” random number of a given type, via ctypes/CFFI. This is very useful on two fronts in terms of Numba providing support for the same:

It means that Numba doesn’t have to literally copy/replicate the entire C implementation in its code base.
It means that Numba can easily achieve bitwise parity to NumPy at a low level as literally the same functions are being called.

Essentially, due to this, it was possible to write something to get basic support plumbed through in a couple of hours (https://github.com/numba/numba/issues/4499#issuecomment-1063138477), without this it would have probably taken nearer a week, so thank you NumPy engineers, this was great! It also means that if NumPy needs to change something low level, these details are hidden from Numba!

There are a couple potential future improvements to this approach:

The ctypes functions appear as addresses which end up getting baked into the Numba JIT implementation as long longs which breaks Numba’s on-disk caching. This is a problem for Numba to resolve!
The ctypes bound functions are essentially called like C function pointers, which means the performance is always going to be limited by having to stage a call and to use the compiled implementation from the NumPy DSO. Were there bitcode equivalants of these functions produced then Numba/JIT compilers could just consume these and LLVM could optimise across these function boundaries.

FFT support in Numba.

Tracking ticket: https://github.com/numba/numba/issues/5864. The main issue is that there’s no binding exposed for the FFT functions in the way there is for BLAS/LAPACK (via SciPy) and now also BitGenerator. As noted on the ticket, part of the issue is that there’s no “consistent” interface for dealing with FFTs. Resolving this in a consistent type stable interface and then exposing this via e.g. ctypes would likely help all JIT applications needing FFT support (and also non-JIT applications!).

Ufuncs.

There are a few interaction points in relation to ufuncs that might be of interest.

The ufunc inner loops could be provided as bitcode such that JIT compilers based on LLVM can request the bitcode implementation for given types and just consume that as part of compilation. This has the advantage of NumPy changes being picked up immediately by JIT compilers and that JIT compiler output becomes more consistent with NumPy. It also means a machine specific inner loop will always be generated by the JIT compiler.
NumPy could supply a hook for adding a JIT compiler to the ufunc machinery. As a result, if there is no specific loop available for given input types (i.e. casting would be required to get a match), the JIT compiler could potentially compile one which could lead to improved performance.
NumPy could abstract the entire ufunc interface such that an alternative implementation could be provided. Mixed with ideas in 1. and 2. this would let e.g. a JIT compiler produce CPU specific loops specialised directly on the types present. This again potentially leading to increased performance.

Release testing

Numba’s test suite and build farm.

Numba has a large test suite, a lot of which exercises the NumPy APIs. The tests in the test suite are similar to NumPy’s but not always the same. It’s quite common for Numba’s test suite to identify potential bugs/changes made to NumPy across releases.

Numba also has a large CI infrastructure which tests approximately the latest 3 or 4 NumPy releases against the Numba test suite on a large number of platforms. It has been noted a few times previously that it would be good for Numba to test against upcoming NumPy releases early as a mutually beneficial operation. Numba’s test suite could potentially pick up issues in NumPy and vice versa. Further, if there are C-API changes in NumPy (especially with respect to ufuncs) there’s a reasonable chance that Numba will pick them up. Numba has an “integration” testing project https://github.com/numba/numba-integration-testing that may help with orchestration of this.

Issue Analytics

State:
Created a year ago
Reactions:5
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

rgommerscommented, Apr 28, 2022

Thanks @stuartarchibald, this is a super useful write-up!

Most of this is quite clear, and it seems feasible to make a number of changes in NumPy that would address some of these points. I’d like to open a discussion on the NumPy mailing list soon about putting something like “improve NumPy-Numba interactions” on the roadmap based on the above.

A few questions:

np.broadcast_arrays example […] In a type system such as Numba’s there is no support for heterogeneously typed mutable containers, however there is support for heterogeneously typed immutable containers.

This one surprised me a little. Is that a fundamental problem with Numba’s type system? What do you do currently, not support such functions, not make them work in nopython mode, or just return a tuple anyway?

np.broadcast_shapes example […] Whilst not directly type stability… the issue is that Numba contributors end up having to “reverse” what is going on to work out what is accepted as input.

Is this different from the general “wherever NumPy expects arrays, it also accepts anything that can be coerced to an array via a np.asarray call (sequences, scalars, generators, objects with a __array__ attribute, etc.”?

Naming of arguments to functions is sometimes inconsistent.

That one is probably best fixed by making arguments where the one-letter name is not meaningful positional-only. Just to make sure: that would fix the issue for Numba, right?

Mismatch in the use of BLAS/LAPACK precision. […] which results in a mismatch compared to NumPy’s result. This is also why Numba has better performance NumPy in this area when it probably should not.

I thought there was a significant performance gain also because Numba translates np.linalg calls to their scipy.linalg.cython_blas/lapack equivalents? (float64 is up to 4x faster for small inputs IIRC, due to avoiding Python overhead). That should remain even if you’d do an upcast right?

It’s a little questionable by the way that NumPy does this upcasting. It won’t be changed, but it basically means that there’s not much point in using those lower-precision dtypes. They’re used much more in deep learning libraries, which don’t do this upcast (the lower precision results are fine, and performance gains are important).

0reactions

asmeurercommented, Jun 8, 2022

It doesn’t seem to have been mentioned in this issue yet, so it’s worth noting that quite a few of these issues intersect with changes from the array API https://numpy.org/devdocs/reference/array_api.html. For example

unique is split into different unique_* functions with consistent return types (the shape is still data-dependent, but that’s fundamental to the function).
Every function/operation in the array API has a return dtype that depends only on the input dtypes.
Functions use positional-only arguments (this is something I forgot to mention in that doc), which presumably fixes the argument naming issue.
The broadcast_arrays thing actually is a list in the array API. Perhaps it should be changed to a Tuple there.

A challenge as Ralf mentioned on the NumPy mailing list is that the array API is only a subset of NumPy, so if NumPy were to just focus on array API compatibility it would miss similar issues in other places. For example, eigvals is not currently in the array API, and it’s possible it won’t be added. If it is added, it will not have the issue you mentioned because the array API functions never use value-dependent return dtypes. So as NumPy makes changes to be compatible with the array API, it should make similar changes everywhere else too, when it makes sense.

As an aside, I’m a little confused by the broadcast_shapes question. Is the issue that it says “tuples” but also accepts lists? That seems like a pretty standard thing that any Python function would do. Perhaps a true mypy compatible signature would more accurately say “Sequence[int]” or something. I would be surprised if NumPy restricted the inputs to only tuples for no good reason. OTOH, I would not really be that surprised if numba did require the input to be tuples, because my general experience with Numba is that it is much more restricting about types.

Top Results From Across the Web

Numba is "only" improving my code by a factor of 4. Can it do ...

Analysis and optimizations. The main problem is that the code create several large temporary arrays stored in RAM while the RAM is slow....

Troubleshooting and tips — Numba 0.50.1 documentation

A common reason for Numba failing to compile (especially in nopython mode) is a type inference failure, essentially Numba cannot work out what...

Array programming with NumPy - Nature

The NumPy array is a data structure that efficiently stores and accesses multidimensional arrays (also known as tensors), and enables a wide ...

Parallel Python - NERSC Documentation

Since you are running at NERSC you may be interested in parallelizing your Python code and/or its I/O. This is a detailed topic...

why weren't Numpy, Scipy, Numba, good enough? - Community

And from what I've read there are compatibility issues between the multiple Python extensions (Cython, PyPy, Numpy, and Numba), and maturity issues for ......