question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Creating coo_sparse matrix from numba typed List is significantly slower than Python list or numpy arrays

See original GitHub issue

Reporting a bug

import numpy as np
from numba.typed import List
from time import time
from scipy.sparse import coo_matrix


N = 1_000_000
M = 1_000_000
i = np.random.randint(0, N, size=M)
j = np.random.randint(0, N, size=M)
v = np.random.rand(M)

# Numpy arrays
t0 = time()
A = coo_matrix((v, (i, j)))
print(f"Numpy array: {time() - t0:.3f}")

# Numba typed List
i, j, v = [List(x) for x in [i, j, v]]

t0 = time()
A = coo_matrix((v, (i, j)))
print(f"Numba typed List: {time() - t0:.3f}")

# Python list
i, j, v = [list(x) for x in [i, j, v]]

t0 = time()
A = coo_matrix((v, (i, j)))
print(f"Python list: {time() - t0:.3f}")

# Numba typed List + conversion time
i, j, v = [List(x) for x in [i, j, v]]

t0 = time()
i, j, v = [np.array(x) for x in [i, j, v]]
A = coo_matrix((v, (i, j)))
print(f"Numba typed List + conversion time: {time() - t0:.3f}")

# Python list + conversion time
i, j, v = [list(x) for x in [i, j, v]]

t0 = time()
i, j, v = [np.array(x) for x in [i, j, v]]
A = coo_matrix((v, (i, j)))
print(f"Python list + conversion time: {time() - t0:.3f}")

Numpy array: 0.001
Numba typed List: 30.124
Python list: 0.305
Numba typed List + conversion time: 14.727
Python list + conversion time: 0.411

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
stuartarchibaldcommented, Oct 30, 2020

Thanks for your quick reply. Actually, I generate i, j, and v lists using a numba-jitted function, in which there’s a for loop. That works perfectly fine and fast. The issue is that I eventually want to make this coo_matrix out of these three lists, which turned out to be the bottleneck.

No problem 😃

I see, if you really need the coo_matrix form from SciPy and want to construct that from typed.List instances then I think you’ll be hitting the general issues with iteration speed of the typed containers from the Python interpreter. Am wondering if we ought to just write some converter methods on typed.List to get back to CPython list or even an array (as I think coo_matrix would be happy with that). CC @esc have you looked into this already?

The just released today 0.52.0RC2 https://numba.discourse.group/t/numba-0-52-0-and-llvmlite-0-35-0-release-candidates/284/4 has a load of improvements to typed list performance in JIT code, expect more to be made incrementally.

1reaction
stuartarchibaldcommented, Oct 30, 2020

Also, for more general questions, I can recommend asking on the Numba discourse forum https://numba.discourse.group 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why is element access for typed lists so much slower than for ...
One problem is that the benchmark is flawed. Indeed, the Numba JIT compiler can (partially) see that your computation is mostly useless ...
Read more >
Best practices for using read-only Python lists
The fastest method by far is passing a Numpy array. The slowest is to use the new and experimental numba.typed.List .
Read more >
Performance Tips — Numba 0.50.1 documentation
The internal implementation relies on a LAPACK and BLAS library to do the numerical work and it obtains the bindings for the necessary...
Read more >
Sparse matrix - Wikipedia
In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero....
Read more >
Matrices - Practical Data Science
The ndarray is the basic data type in Numpy. These can be created the numpy.array command, passing a 1D list of number to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found