Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong results when using jit(parallel=True)

See original GitHub issue

This bug report is very similar to issue #7984 . Using Python 3.8.7, numba-0.55.1 with llvmlite-0.38.0, Under Win-10:

from numpy import ndarray
import numpy as np
import numba
from numba import jit


def test_par(img: ndarray) -> ndarray:

    rad = 5
    w2 = rad + rad

    min_value = np.amin(img)
    # print("")
    img = img - min_value
    s = img.shape
    a_p = np.zeros((s[0] + w2, s[1] + w2), dtype=np.int32)
    a_p[rad:rad+s[0], rad:rad+s[1]] = img

    b = np.zeros(s, dtype=np.float32)

    for i in range(rad):
        for j in range(rad):
            a_1 = a_p[i:i+s[0], j:j+s[1]]
            b = b + a_1
    
    return b

test_np = test_par
test_nb = numba.jit(nopython=True, fastmath=True, parallel=False)(test_par)
test_nb_p = numba.jit(nopython=True, fastmath=True, parallel=True)(test_par)

if __name__ == '__main__':
    image = np.empty((256, 256), dtype=np.uint8)
    for k in range(256):
        for m in range(256):
            image[k, m] = k

    res0 = test_np(image)
    res1 = test_nb(image)
    res2 = test_nb_p(image)

    diff1 = np.max(np.abs(res1-res0))
    print(diff1)
    diff2 = np.max(np.abs(res2-res0))
    print(diff2)

The results are: 0.0 4250.0

There seems to be a race condition that causes different results for parallel=true. Strangely enough, when uncommenting the print("") command, the results are OK. Looks like the print command stops somehow the race condition (like openning the box of Schrödinger’s cat:-)).

Tracking back the problem, on numba-0.52.0 & llvmlite-0.35.0 this problem does not exist, but on numba-0.53.0 & llvmlite-0.36.0 it appears.

I hope that you can reproduce this bug.

Thanks,

Issue Analytics

State:
Created a year ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

stuartarchibaldcommented, Apr 29, 2022

Thanks for the report @AvinoamK. Thanks also for debugging this further @apmasell and @czgdp1807. That inserting a print in that specific location “fixes” it suggests it’s probably an issue to do with fusion of parfors nodes as the print statement would defeat that.

Here’s a MWR:

from numba import njit
import numpy as np

parallel_options = {
    'comprehension': True,  # parallel comprehension
    'prange':        True,  # parallel for-loop
    'numpy':         True,  # parallel numpy calls
    'reduction':     True,  # parallel reduce calls
    'setitem':       True,  # parallel setitem
    'stencil':       True,  # parallel stencils
    'fusion':        True,  # enable fusion or not
}


@njit(parallel=parallel_options)
def foo(img):
    min_value = np.amin(img)
    img = img - min_value
    return img


n = 2
image = np.arange(n * n, dtype=np.uint8).reshape((n, n))

expected = foo.py_func(image.copy())
got = foo(image.copy())

print(foo.parallel_diagnostics())

np.testing.assert_allclose(expected, got)

if the 'fusion' option in the parallel options dictionary is set to False then the code works as expected. Fusion probably shouldn’t occur in this case as the fused code would be computing the amin of an array that’s being mutated. Would guess the cross iteration dependency checks need making aware of this sort of use case, but would need to look at it more.

0reactions

czgdp1807commented, May 2, 2022

I investigated a little bit more after your comment @stuartarchibald . I think that the has_cross_iter_dep is giving a False positive that there is no cross iteration dependency. So, I applied the following diff which adds a conservative check to figure out cross iteration dependency. Basically after the diff, if in a loop body, two successive statements have a common variable then has_cross_iter_dep will assume that there is a cross iteration dependency. However I think this is very restrictive. May be we can relax it to, “only when LHS of the previous statement is the RHS of the current statement” then assume a cross iteration dependency. One more example where main fails but with the applied diff things workout.

https://github.com/numba/numba/blob/87ab859d95aa6322a72d3d31027f39a6ddf4af10/numba/parfors/parfor.py#L4113

Diff

diff --git a/numba/parfors/parfor.py b/numba/parfors/parfor.py
index c24a389e1..f9cc1e4b4 100644
--- a/numba/parfors/parfor.py
+++ b/numba/parfors/parfor.py
@@ -4117,9 +4117,11 @@ def has_cross_iter_dep(parfor):
     # TODO: make it more accurate using ud-chains
     indices = {l.index_variable for l in parfor.loop_nests}
     for b in parfor.loop_body.values():
+        stmts_vars = []
         for stmt in b.body:
             # GetItem/SetItem nodes are fine since can't have expression ins
ide
             # and only simple indices are possible
+            stmts_vars.append(stmt.list_vars())
             if isinstance(stmt, (ir.SetItem, ir.StaticSetItem)):
                 continue
             # tuples are immutable so no expression on parfor possible
@@ -4131,6 +4133,13 @@ def has_cross_iter_dep(parfor):
             if not indices.isdisjoint(stmt.list_vars()):
                 dprint("has_cross_iter_dep found", indices, stmt)
                 return True
+
+        for i in range(len(stmts_vars) - 1):
+            if (stmts_vars[i] is not None and
+                stmts_vars[i + 1] is not None and
+                not set(stmts_vars[i]).isdisjoint(set(stmts_vars[i + 1]))):
+                return True
+
     return False

Definition of foo which fail with main branch but work with the above diff

@njit(parallel=parallel_options)
def foo(img1, img2):
    min_value1 = np.amin(img1)
    img2 = img1 - min_value1
    return img1, img2

Please feel free to correct me if I mis-interpreted something. I am learning about Numba with the help of these investigations and your comments. Thanks. 😃

Top Results From Across the Web

Usage of parallel option in numba.jit decoratior makes ...

The problem with parallel=True is that it's a black-box. Numba doesn't even guarantee that it will actually parallelize anything.

Automatic parallelization with @jit - Numba

parallel =True ) is support for explicit parallel loops. One can use Numba's ; prange instead of ; range to specify that a...

Automatic parallelization with @jit - Numba documentation

Setting the parallel option for jit() enables a Numba transformation pass that attempts to automatically parallelize and perform other optimizations on (part ...

How do I parallelize this code? - Support - Numba Discussion

If I run with parallel=False and prange the result is correct (that is ... from numba import jit import numpy as np @jit(nopython=True, ......

numba.pdf - PRACE Events

Cannot be used in conjunction with parallel=True. 6. Page 7. The Jit compiler options/toggles o parallel=True - enables the automatic parallelization of a ......