BUG: np.dot is not thread-safe with OpenBLAS
See original GitHub issueI’m using numpy (1.14.1
) linked against OpenBLAS 0.2.18
and it looks like np.dot
(that uses dgemm
routine from openblas) is not thread-safe :
import numpy as np
from multiprocessing.pool import ThreadPool
dim = 4 # for larger value of dim, there's no issue
a = np.arange(10**5 / dim) / 10.**5
b = np.arange(10**5).reshape(-1, dim) / 10.**5
pp = ThreadPool(4)
threaded_result = pp.map(a.dot, [b] * 4)
pp.close()
pp.terminate()
result = a.dot(b)
print [np.max(np.abs(x - result)) for x in threaded_result]
# print
# [1822.7068840452998, 1540.2636287421, 96.10628199050007, 0.0]
# or other rather random results whereas it should return zeros
I don’t know if this kind of behavior is expected, is it numpy or rather openblas bug ?
Note :
- numpy with MKL blas does not have this issue at all
- everything runs fine if openblas threading is turned off (
export OPENBLAS_NUM_THREADS=1
) - I don’t know how to test openblas==0.2.20 version that maybe solves this
Some extra info if needed :
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Stepping: 4
CPU MHz: 2500.060
BogoMIPS: 5000.12
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm retpoline kaiser fsgsbase smep erms xsaveopt
np.show_config()
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
blis_info:
NOT AVAILABLE
openblas_lapack_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
lapack_mkl_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
Issue Analytics
- State:
- Created 5 years ago
- Comments:21 (15 by maintainers)
Top Results From Across the Web
Concurrent np.dot (matrix multiplications) crash
Hi guys, I'm trying to parallelize some calculations which involve matrix multiplication. Thus, I'm using numpy.dot within a prange loop.
Read more >why np.dot does not parallelize in python/numpy even though ...
I read that with multithreaded blas numpy.dot should use multiple cores. However, this is not the case for me.
Read more >SciPy 1.5.0 Release Notes — SciPy v1.7.1 Manual
This release requires Python 3.6+ and NumPy 1.14.5 or greater. ... Generation of random variates is now thread safe for all SciPy distributions....
Read more >Release notes — NumPy v1.25.dev0 Manual
Using the aliases of builtin types like np.int is deprecated · Passing shape=None to functions with a non-optional shape argument is deprecated ...
Read more >python3-numpy-devel-1.17.3-7.1.ppc64le RPM
OpenBLAS is not availble for sc390: disable buidling on s390 for HPC ... #9746 BUG: Memory leak in np.dot of size 0 *...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
OpenBLAS fixed xianyi/OpenBLAS#1844.
Well, maybe we can add code such as the one you linked to to change the number of threads. If numpy knows the BLAS implementation it should release the GIL. But it could refuse to release the GIL if it sees one it does not recognize. For OpenBLAS and the typical ones it should definitely be rather fixed of course.