Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: np.dot is not thread-safe with OpenBLAS

See original GitHub issue

I’m using numpy (1.14.1) linked against OpenBLAS 0.2.18 and it looks like np.dot (that uses dgemm routine from openblas) is not thread-safe :

import numpy as np
from multiprocessing.pool import ThreadPool

dim = 4   # for larger value of dim, there's no issue
a = np.arange(10**5 / dim) / 10.**5
b = np.arange(10**5).reshape(-1, dim) / 10.**5

pp = ThreadPool(4)
threaded_result = pp.map(a.dot, [b] * 4) 
pp.close()
pp.terminate()

result = a.dot(b)
print [np.max(np.abs(x - result)) for x in threaded_result]

# print
# [1822.7068840452998, 1540.2636287421, 96.10628199050007, 0.0]
# or other rather random results whereas it should return zeros

I don’t know if this kind of behavior is expected, is it numpy or rather openblas bug ?

Note :

numpy with MKL blas does not have this issue at all
everything runs fine if openblas threading is turned off (export OPENBLAS_NUM_THREADS=1)
I don’t know how to test openblas==0.2.20 version that maybe solves this

Some extra info if needed :

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Stepping:              4
CPU MHz:               2500.060
BogoMIPS:              5000.12
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0-7,16-23
NUMA node1 CPU(s):     8-15,24-31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm retpoline kaiser fsgsbase smep erms xsaveopt

np.show_config()
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
blis_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
lapack_mkl_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE

Issue Analytics

State:
Created 5 years ago
Comments:21 (15 by maintainers)

Top GitHub Comments

2reactions

mattipcommented, Nov 12, 2018

OpenBLAS fixed xianyi/OpenBLAS#1844.

1reaction

sebergcommented, Nov 1, 2018

Well, maybe we can add code such as the one you linked to to change the number of threads. If numpy knows the BLAS implementation it should release the GIL. But it could refuse to release the GIL if it sees one it does not recognize. For OpenBLAS and the typical ones it should definitely be rather fixed of course.