Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Poor np.full performace with jit

See original GitHub issue

numba version: 0.54.1

This is the testing code.

import numpy as np
from numba import jit, prange, set_num_threads
import time

set_num_threads(4)

@jit
def f1(shape):
    return np.full(shape, np.nan, np.float64)

@jit
def f2(shape):
    ans = np.empty(shape, np.float64)
    for i in range(ans.shape[0]):
        for j in range(ans.shape[1]):
            ans[i, j] = np.nan

@jit(parallel=True)
def f3(shape):
    ans = np.empty(shape, np.float64)
    for i in prange(ans.shape[0]):
        for j in range(ans.shape[1]):
            ans[i, j] = np.nan

# Warm up
shape = (12, 10)
f1(shape)
f2(shape)
f3(shape)

shape = (128000, 1000)

t0 = time.time()
np.full(shape, np.nan, np.float64)
print(time.time() - t0)

t0 = time.time()
f1(shape)
print(time.time() - t0)

t0 = time.time()
f2(shape)
print(time.time() - t0)

t0 = time.time()
f3(shape)
print(time.time() - t0)

This is the output

0.24103260040283203 # numpy
0.5575239658355713 # np.full in jit
0.45681166648864746 # fill with for-loop
0.18029499053955078 # fill with parallel for-loop

The speed of np.full in jit is less than half compared with the normal one from my testing.

Issue Analytics

State:
Created 2 years ago
Comments:7 (6 by maintainers)

Top GitHub Comments

2reactions

rishi-kulkarnicommented, Dec 21, 2021

I can’t exactly reproduce this, but the problem is that numba’s np.full uses np.ndindex as the iterator, which is quite slow. Some of the other numba array creation routines that are fill element-by-element use array.flat and iterate over it using enumerate, which is a good bit faster.

@jit
def nb_full(shape, fill_value, dtype):
    ans = np.empty(shape, dtype)
    fl = ans.flat
    for idx, v in enumerate(fl):
      fl[idx] = fill_value
    return ans

@jit
def f4(shape):
  return nb_full(shape, np.nan, np.float64)

@jit
def f1(shape):
    return np.full(shape, np.nan, np.float64)

f1(shape)
f4(shape)

t0 = time.time()
np.full(shape, np.nan, np.float64)
print(time.time() - t0)

t0 = time.time()
f1(shape)
print(time.time() - t0)

t0 = time.time()
f4(shape)
print(time.time() - t0)

On my end, this outputs the following:

0.1057584285736084 # numpy
0.2277970314025879 # current implementation
0.10829496383666992 # array.flat implementation

This at least matches the numpy implementation, which should be as good as it gets.

1reaction

szsdkcommented, Dec 16, 2021

But this does not look like just overhead. This is the output that I increase the shape 10 times up. (shape = (1280000, 1000))

2.9751839637756348
5.439911603927612
4.502171516418457
1.7119011878967285

Top Results From Across the Web

Boost your Numpy-Based Analysis Easily — In the right way

JIT -compiler based on low level virtual machine (LLVM) is the main engine behind Numba that should generally make it be more effective...

Include non-temporal JIT annotations to speed up memory ...

I think np.fill can be sped up a lot by just using the memset JIT function internally. However, while speeding up the np.fill...

numba.jit can't compile np.roll - Stack Overflow

I tried in this example show the performance and capability of numba in this regard, just as an example; I will write the...

Compiling Python code with @jit - Numba

Using this decorator, you can mark a function for optimization by Numba's JIT compiler. Various invocation modes trigger differing compilation options and ...

What Is a JIT and How Can a Pyjion Speed Up Your Python?

Have you thought of using a JIT (Just-In-Time Compiler)? This week on the show, we have Real Python author and previous guest Anthony...