Creating coo_sparse matrix from numba typed List is significantly slower than Python list or numpy arrays
See original GitHub issueReporting a bug
- I have tried using the latest released version of Numba (most recent is visible in the change log (https://github.com/numba/numba/blob/master/CHANGE_LOG).
- I have included below a minimal working reproducer (if you are unsure how to write one see http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports).
import numpy as np
from numba.typed import List
from time import time
from scipy.sparse import coo_matrix
N = 1_000_000
M = 1_000_000
i = np.random.randint(0, N, size=M)
j = np.random.randint(0, N, size=M)
v = np.random.rand(M)
# Numpy arrays
t0 = time()
A = coo_matrix((v, (i, j)))
print(f"Numpy array: {time() - t0:.3f}")
# Numba typed List
i, j, v = [List(x) for x in [i, j, v]]
t0 = time()
A = coo_matrix((v, (i, j)))
print(f"Numba typed List: {time() - t0:.3f}")
# Python list
i, j, v = [list(x) for x in [i, j, v]]
t0 = time()
A = coo_matrix((v, (i, j)))
print(f"Python list: {time() - t0:.3f}")
# Numba typed List + conversion time
i, j, v = [List(x) for x in [i, j, v]]
t0 = time()
i, j, v = [np.array(x) for x in [i, j, v]]
A = coo_matrix((v, (i, j)))
print(f"Numba typed List + conversion time: {time() - t0:.3f}")
# Python list + conversion time
i, j, v = [list(x) for x in [i, j, v]]
t0 = time()
i, j, v = [np.array(x) for x in [i, j, v]]
A = coo_matrix((v, (i, j)))
print(f"Python list + conversion time: {time() - t0:.3f}")
Numpy array: 0.001
Numba typed List: 30.124
Python list: 0.305
Numba typed List + conversion time: 14.727
Python list + conversion time: 0.411
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Why is element access for typed lists so much slower than for ...
One problem is that the benchmark is flawed. Indeed, the Numba JIT compiler can (partially) see that your computation is mostly useless ...
Read more >Best practices for using read-only Python lists
The fastest method by far is passing a Numpy array. The slowest is to use the new and experimental numba.typed.List .
Read more >Performance Tips — Numba 0.50.1 documentation
The internal implementation relies on a LAPACK and BLAS library to do the numerical work and it obtains the bindings for the necessary...
Read more >Sparse matrix - Wikipedia
In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero....
Read more >Matrices - Practical Data Science
The ndarray is the basic data type in Numpy. These can be created the numpy.array command, passing a 1D list of number to...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
No problem 😃
I see, if you really need the
coo_matrix
form from SciPy and want to construct that fromtyped.List
instances then I think you’ll be hitting the general issues with iteration speed of the typed containers from the Python interpreter. Am wondering if we ought to just write some converter methods ontyped.List
to get back to CPython list or even an array (as I thinkcoo_matrix
would be happy with that). CC @esc have you looked into this already?The just released today 0.52.0RC2 https://numba.discourse.group/t/numba-0-52-0-and-llvmlite-0-35-0-release-candidates/284/4 has a load of improvements to typed list performance in JIT code, expect more to be made incrementally.
Also, for more general questions, I can recommend asking on the Numba discourse forum https://numba.discourse.group 😃