question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

improve performance of numba.typed.List constructor with Python list as arg

See original GitHub issue

Thanks for making Numba, it is a fantastic tool!

Background

I have several functions where it is most natural to take Python lists as arguments, as opposed to Numpy arrays. In some cases it is not even possible to use Numpy arrays, because the arguments are lists-of-lists with different lengths.

In the recent Numba versions a warning is generated when calling Jitted functions with Python lists as arguments: NumbaPendingDeprecationWarning: Encountered the use of a type that is scheduled for deprecation: type 'reflected list' found for argument

Instead it is recommended to use numba.typed.List, but that is very slow as shown below.

Feature Suggestion

My Jitted functions are typically read-only, so it really isn’t necessary for the original Python list contents to be updated once the Jitted function returns. I wonder if perhaps numba.typed.List could be made to run much faster, if it was somehow informed that the list contents will not be modified? Or maybe there is just a bug in numba.typed.List that makes it run so slowly?

Related Issues

We have also had a discussion about this on the Numba discourse site here, and I think it now merits a proper issue here on GitHub. Issues #5909 and #5822 seem to be related to this.

Versions

  • Numba: 0.54.1
  • Numpy: 1.20.3
  • Python: 3.8.12

Example

This examples shows that the function sum_list only takes 2.8 ms, but the conversion of the argument from a Python list to a Numba list takes 1.37 s, which is 500 times slower than the actual computation!

import numpy as np
import numba
from numba import njit

# Number of elements in the list.
n = 1000000

# Numpy array with random floats.
x_np = np.random.normal(size=n)

# Convert Numpy array to Python list.
x_list = x_np.tolist()

# Convert Python list to Numba list.
x_list_numba = numba.typed.List(x_list)

@njit
def sum_list(x):
    # Sum all elements of the list/array x.
    s = 0
    for i in range(len(x)):
        s += x[i]
    return s

# Ensure the Jit function has been compiled before timing tests.
sum_list(x=x_list_numba)

%timeit numba.typed.List(x_list)
# 1.37 s ± 49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit sum_list(x=x_list_numba)
# 2.8 ms ± 61.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Thanks!

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:25 (15 by maintainers)

github_iconTop GitHub Comments

1reaction
esccommented, Jan 14, 2022

Thanks for the extremely fast response! (Are you wearing a cape by any chance? 😃

No, and others have previously hypothesized that I have a bot in a sidecar on this account. This is not true, yet. Pieter Hintjens (R.I.P.) once convinced me that in order to drive engagement on an open source project, decrease the response latency, so here I am!

My original use-case was a list of tuples used for specifying a sparse matrix, something like this [(1, 2, 0.5), (3, 4, 0.7), …] where each tuple is (row, col, value) of the matrix. These are most naturally specified as lists of tuples in my use-case. But I ended up making them as 3 separate Numpy arrays instead, so they would run fast with Numba, as the current version of typedlist was too slow for this format.

I see. Glad you have a workaround. Maybe for sparse arrays you could use: https://sparse.pydata.org/en/stable/ – it is also based on Numba and should provide adequate to optimal performance, just mentioning this in case you are not aware yet.

0reactions
esccommented, Mar 1, 2022

Perhaps it would be useful to add something like the convert2 function to Numba?

Perhaps. It could be part of a special code path, perhaps as a factory method on numba.typed.List. For example from_nested_list() or so. It would take sane defaults and produces something useful or the users could override the arguments and be very explicit about the parameters. Eventually this could be wired into the constructor.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Supported Python features - Numba
Improving the string performance is an ongoing task, but the speed of CPython ... and the constructors [] and list() will create a...
Read more >
Supported Python features — Numba 0.50.1 documentation
As the typed-list stabilizes it will fully replace the reflected-list and the constructors [] and list() will create a typed-list instead of a...
Read more >
Performance Tips — Numba 0.50.1 documentation
Whilst the use of looplifting in object mode can enable some performance increase, getting functions to compile under no python mode is really...
Read more >
2.6. Supported Python features - Numba
Improving the string performance is an ongoing task, but the speed of CPython ... As of version 0.45.0 a new implementation, the so-called...
Read more >
A ~5 minute guide to Numba — Numba 0.50.1 documentation
Numba is a just-in-time compiler for Python that works best on code that uses NumPy arrays and functions, and loops. The most common...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found