Poor np.full performace with jit
See original GitHub issuenumba version: 0.54.1
This is the testing code.
import numpy as np
from numba import jit, prange, set_num_threads
import time
set_num_threads(4)
@jit
def f1(shape):
return np.full(shape, np.nan, np.float64)
@jit
def f2(shape):
ans = np.empty(shape, np.float64)
for i in range(ans.shape[0]):
for j in range(ans.shape[1]):
ans[i, j] = np.nan
@jit(parallel=True)
def f3(shape):
ans = np.empty(shape, np.float64)
for i in prange(ans.shape[0]):
for j in range(ans.shape[1]):
ans[i, j] = np.nan
# Warm up
shape = (12, 10)
f1(shape)
f2(shape)
f3(shape)
shape = (128000, 1000)
t0 = time.time()
np.full(shape, np.nan, np.float64)
print(time.time() - t0)
t0 = time.time()
f1(shape)
print(time.time() - t0)
t0 = time.time()
f2(shape)
print(time.time() - t0)
t0 = time.time()
f3(shape)
print(time.time() - t0)
This is the output
0.24103260040283203 # numpy
0.5575239658355713 # np.full in jit
0.45681166648864746 # fill with for-loop
0.18029499053955078 # fill with parallel for-loop
The speed of np.full in jit is less than half compared with the normal one from my testing.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (6 by maintainers)
Top Results From Across the Web
Boost your Numpy-Based Analysis Easily — In the right way
JIT -compiler based on low level virtual machine (LLVM) is the main engine behind Numba that should generally make it be more effective...
Read more >Include non-temporal JIT annotations to speed up memory ...
I think np.fill can be sped up a lot by just using the memset JIT function internally. However, while speeding up the np.fill...
Read more >numba.jit can't compile np.roll - Stack Overflow
I tried in this example show the performance and capability of numba in this regard, just as an example; I will write the...
Read more >Compiling Python code with @jit - Numba
Using this decorator, you can mark a function for optimization by Numba's JIT compiler. Various invocation modes trigger differing compilation options and ...
Read more >What Is a JIT and How Can a Pyjion Speed Up Your Python?
Have you thought of using a JIT (Just-In-Time Compiler)? This week on the show, we have Real Python author and previous guest Anthony...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I can’t exactly reproduce this, but the problem is that numba’s np.full uses np.ndindex as the iterator, which is quite slow. Some of the other numba array creation routines that are fill element-by-element use array.flat and iterate over it using enumerate, which is a good bit faster.
On my end, this outputs the following:
This at least matches the numpy implementation, which should be as good as it gets.
But this does not look like just overhead. This is the output that I increase the shape 10 times up. (
shape = (1280000, 1000))