Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Creating coo_sparse matrix from numba typed List is significantly slower than Python list or numpy arrays

See original GitHub issue

Reporting a bug

I have tried using the latest released version of Numba (most recent is visible in the change log (https://github.com/numba/numba/blob/master/CHANGE_LOG).
I have included below a minimal working reproducer (if you are unsure how to write one see http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports).

import numpy as np
from numba.typed import List
from time import time
from scipy.sparse import coo_matrix


N = 1_000_000
M = 1_000_000
i = np.random.randint(0, N, size=M)
j = np.random.randint(0, N, size=M)
v = np.random.rand(M)

# Numpy arrays
t0 = time()
A = coo_matrix((v, (i, j)))
print(f"Numpy array: {time() - t0:.3f}")

# Numba typed List
i, j, v = [List(x) for x in [i, j, v]]

t0 = time()
A = coo_matrix((v, (i, j)))
print(f"Numba typed List: {time() - t0:.3f}")

# Python list
i, j, v = [list(x) for x in [i, j, v]]

t0 = time()
A = coo_matrix((v, (i, j)))
print(f"Python list: {time() - t0:.3f}")

# Numba typed List + conversion time
i, j, v = [List(x) for x in [i, j, v]]

t0 = time()
i, j, v = [np.array(x) for x in [i, j, v]]
A = coo_matrix((v, (i, j)))
print(f"Numba typed List + conversion time: {time() - t0:.3f}")

# Python list + conversion time
i, j, v = [list(x) for x in [i, j, v]]

t0 = time()
i, j, v = [np.array(x) for x in [i, j, v]]
A = coo_matrix((v, (i, j)))
print(f"Python list + conversion time: {time() - t0:.3f}")

Numpy array: 0.001
Numba typed List: 30.124
Python list: 0.305
Numba typed List + conversion time: 14.727
Python list + conversion time: 0.411

Issue Analytics

State:
Created 3 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

stuartarchibaldcommented, Oct 30, 2020

Thanks for your quick reply. Actually, I generate i, j, and v lists using a numba-jitted function, in which there’s a for loop. That works perfectly fine and fast. The issue is that I eventually want to make this coo_matrix out of these three lists, which turned out to be the bottleneck.

No problem 😃

I see, if you really need the coo_matrix form from SciPy and want to construct that from typed.List instances then I think you’ll be hitting the general issues with iteration speed of the typed containers from the Python interpreter. Am wondering if we ought to just write some converter methods on typed.List to get back to CPython list or even an array (as I think coo_matrix would be happy with that). CC @esc have you looked into this already?

The just released today 0.52.0RC2 https://numba.discourse.group/t/numba-0-52-0-and-llvmlite-0-35-0-release-candidates/284/4 has a load of improvements to typed list performance in JIT code, expect more to be made incrementally.

1reaction

stuartarchibaldcommented, Oct 30, 2020

Also, for more general questions, I can recommend asking on the Numba discourse forum https://numba.discourse.group 😃

Top Results From Across the Web

Why is element access for typed lists so much slower than for ...

One problem is that the benchmark is flawed. Indeed, the Numba JIT compiler can (partially) see that your computation is mostly useless ...

Best practices for using read-only Python lists

The fastest method by far is passing a Numpy array. The slowest is to use the new and experimental numba.typed.List .

Performance Tips — Numba 0.50.1 documentation

The internal implementation relies on a LAPACK and BLAS library to do the numerical work and it obtains the bindings for the necessary...

Sparse matrix - Wikipedia

In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero....

Matrices - Practical Data Science

The ndarray is the basic data type in Numpy. These can be created the numpy.array command, passing a 1D list of number to...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Creating coo_sparse matrix from numba typed List is significantly slower than Python list or numpy arrays

Reporting a bug

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Double free or corruption on numba import w/ icc-rt

Typed List causes IPython console to hang when gets printed