Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

improve performance of numba.typed.List constructor with Python list as arg

See original GitHub issue

Thanks for making Numba, it is a fantastic tool!

Background

I have several functions where it is most natural to take Python lists as arguments, as opposed to Numpy arrays. In some cases it is not even possible to use Numpy arrays, because the arguments are lists-of-lists with different lengths.

In the recent Numba versions a warning is generated when calling Jitted functions with Python lists as arguments: NumbaPendingDeprecationWarning: Encountered the use of a type that is scheduled for deprecation: type 'reflected list' found for argument

Instead it is recommended to use numba.typed.List, but that is very slow as shown below.

Feature Suggestion

My Jitted functions are typically read-only, so it really isn’t necessary for the original Python list contents to be updated once the Jitted function returns. I wonder if perhaps numba.typed.List could be made to run much faster, if it was somehow informed that the list contents will not be modified? Or maybe there is just a bug in numba.typed.List that makes it run so slowly?

Related Issues

We have also had a discussion about this on the Numba discourse site here, and I think it now merits a proper issue here on GitHub. Issues #5909 and #5822 seem to be related to this.

Versions

Numba: 0.54.1
Numpy: 1.20.3
Python: 3.8.12

Example

This examples shows that the function sum_list only takes 2.8 ms, but the conversion of the argument from a Python list to a Numba list takes 1.37 s, which is 500 times slower than the actual computation!

import numpy as np
import numba
from numba import njit

# Number of elements in the list.
n = 1000000

# Numpy array with random floats.
x_np = np.random.normal(size=n)

# Convert Numpy array to Python list.
x_list = x_np.tolist()

# Convert Python list to Numba list.
x_list_numba = numba.typed.List(x_list)

@njit
def sum_list(x):
    # Sum all elements of the list/array x.
    s = 0
    for i in range(len(x)):
        s += x[i]
    return s

# Ensure the Jit function has been compiled before timing tests.
sum_list(x=x_list_numba)

%timeit numba.typed.List(x_list)
# 1.37 s ± 49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit sum_list(x=x_list_numba)
# 2.8 ms ± 61.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Thanks!

Issue Analytics

State:
Created 2 years ago
Comments:25 (15 by maintainers)

Top GitHub Comments

1reaction

esccommented, Jan 14, 2022

Thanks for the extremely fast response! (Are you wearing a cape by any chance? 😃

No, and others have previously hypothesized that I have a bot in a sidecar on this account. This is not true, yet. Pieter Hintjens (R.I.P.) once convinced me that in order to drive engagement on an open source project, decrease the response latency, so here I am!

My original use-case was a list of tuples used for specifying a sparse matrix, something like this [(1, 2, 0.5), (3, 4, 0.7), …] where each tuple is (row, col, value) of the matrix. These are most naturally specified as lists of tuples in my use-case. But I ended up making them as 3 separate Numpy arrays instead, so they would run fast with Numba, as the current version of typedlist was too slow for this format.

I see. Glad you have a workaround. Maybe for sparse arrays you could use: https://sparse.pydata.org/en/stable/ – it is also based on Numba and should provide adequate to optimal performance, just mentioning this in case you are not aware yet.

0reactions

esccommented, Mar 1, 2022

Perhaps it would be useful to add something like the convert2 function to Numba?

Perhaps. It could be part of a special code path, perhaps as a factory method on numba.typed.List. For example from_nested_list() or so. It would take sane defaults and produces something useful or the users could override the arguments and be very explicit about the parameters. Eventually this could be wired into the constructor.