Slower without `OMP_NUM_THREADS=1` than with `OMP_NUM_THREADS=1`
See original GitHub issueI tried with threadpool_limits(1, user_api=None): with a not so simple case : https://gitlab.com/paugier/tsp-pythran (branch threadpoolctl) on Debian.
The case uses Pythran (through Transonic but I don’t see how it could change anything for this) to get an extension accelerated with OpenMP. Pythran uses the system OpenBlas library.
To reproduce (sorry, I don’t use Git):
hg clone https://gitlab.com/paugier/tsp-pythran.git
cd tsp-pythran
hg up threadpoolctl
# compile the extension with openmp
transonic tsp.py -pf "-march=native -DUSE_XSIMD -fopenmp"
# wait to get the extension ready
python run-test-omp.py
OMP_NUM_THREADS=1 python run-test-omp.py
The good news is that threadpoolctl manages to reduce the number of threads used with OpenMP. However, I get something strange that I don’t understand:
I’m not sure it’s an issue, but I get something slower with python run-test-omp.py (or OMP_NUM_THREADS=2 python run-test-omp.py) than with OMP_NUM_THREADS=1 python run-test-omp.py.
I actually get the same behavior if the extension is built without OpenMP, i.e. just with transonic tsp.py.
OMP_NUM_THREADS=1 python run-test-omp.py
[{'filename_prefixes': ('libopenblas',),
'internal_api': 'openblas',
'module_path': '/home/users/augier3pi/.pyenv/versions/3.7.2/lib/python3.7/site-packages/numpy/.libs/libopenblasp-r0-382c8f3a.3.5.dev.so',
'n_thread': 1,
'prefix': 'libopenblas',
'user_api': 'blas',
'version': '0.3.5.dev'},
{'filename_prefixes': ('libiomp', 'libgomp', 'libomp', 'vcomp'),
'internal_api': 'openmp',
'module_path': '/usr/lib/x86_64-linux-gnu/libgomp.so.1',
'n_thread': 1,
'prefix': 'libgomp',
'user_api': 'openmp',
'version': None},
{'filename_prefixes': ('libopenblas',),
'internal_api': 'openblas',
'module_path': '/home/users/augier3pi/.pyenv/versions/3.7.2/lib/python3.7/site-packages/scipy/.libs/libopenblasp-r0-8dca6697.3.0.dev.so',
'n_thread': 1,
'prefix': 'libopenblas',
'user_api': 'blas',
'version': None},
{'filename_prefixes': ('libopenblas',),
'internal_api': 'openblas',
'module_path': '/usr/lib/libopenblas.so.0',
'n_thread': 1,
'prefix': 'libopenblas',
'user_api': 'blas',
'version': None}]
start search
run time = 0.43 s
start search
run time = 0.43 s
start search
run time = 0.44 s
start search
run time = 0.46 s
python run-test-omp.py
[{'filename_prefixes': ('libopenblas',),
'internal_api': 'openblas',
'module_path': '/home/users/augier3pi/.pyenv/versions/3.7.2/lib/python3.7/site-packages/numpy/.libs/libopenblasp-r0-382c8f3a.3.5.dev.so',
'n_thread': 4,
'prefix': 'libopenblas',
'user_api': 'blas',
'version': '0.3.5.dev'},
{'filename_prefixes': ('libiomp', 'libgomp', 'libomp', 'vcomp'),
'internal_api': 'openmp',
'module_path': '/usr/lib/x86_64-linux-gnu/libgomp.so.1',
'n_thread': 4,
'prefix': 'libgomp',
'user_api': 'openmp',
'version': None},
{'filename_prefixes': ('libopenblas',),
'internal_api': 'openblas',
'module_path': '/home/users/augier3pi/.pyenv/versions/3.7.2/lib/python3.7/site-packages/scipy/.libs/libopenblasp-r0-8dca6697.3.0.dev.so',
'n_thread': 4,
'prefix': 'libopenblas',
'user_api': 'blas',
'version': None},
{'filename_prefixes': ('libopenblas',),
'internal_api': 'openblas',
'module_path': '/usr/lib/libopenblas.so.0',
'n_thread': 4,
'prefix': 'libopenblas',
'user_api': 'blas',
'version': None}]
start search
run time = 0.57 s
start search
run time = 0.59 s
start search
run time = 0.58 s
start search
run time = 0.58 s
Issue Analytics
- State:
- Created 4 years ago
- Comments:20 (3 by maintainers)

Top Related StackOverflow Question
I tend to agree but I am not sure. It looks like openBLAS relies on
OMP_NUM_THREADS=1here but they don’t seem to be checking theomp_get_max_threadsto disable the mapping. I did not investigate enough to see if it could be set programatically.Thus, this seems to be an issue not related to this library no? Note that we just reduced the overhead of the context manager so the results should be even closer now when using
threadpool_limitsor not.Let us know if you feel like there is still some issue.