Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Running tests in parallel is not any faster than without pytest-xdist anymore

See original GitHub issue

I have no idea what is going on here, but no matter how I run the tests, I’m not getting them to run any faster in parallel. These commands all give me the same timings (about 21 sec):

python -c "import scipy; scipy.test()"
python -c "import scipy; scipy.test(parallel=4)"
pytest --pyargs scipy.linalg
pytest -n 4 --pyargs scipy.linalg

And the same if I try it on other submodules or on scipy as a whole. I even uninstalled pytest-xdist to make sure - no difference whatsoever.

This used to work after gh-10172 two years ago. Does this still work for anyone else?

Issue Analytics

State:
Created 2 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

rgommerscommented, Jul 17, 2021

Okay you do have a good point with the influence of parallelism @tylerjereddy. The oversubscription issue matters more on a larger machine.

# spatial
$ export OMP_NUM_THREADS=1
$ pytest -n1 --pyargs scipy.spatial
43.68s
$ pytest -n2 --pyargs scipy.spatial
23.44s
$ pytest -n4 --pyargs scipy.spatial
13.35s
$ pytest -n12 --pyargs scipy.spatial
13.28s

$ export OMP_NUM_THREADS=4
$ pytest -n1 --pyargs scipy.spatial
45.67s 
$ pytest -n2 --pyargs scipy.spatial
25.06s
$ pytest -n3 --pyargs scipy.spatial
17.93s
$ pytest -n4 --pyargs scipy.spatial
14.43s

# linalg
$ export OMP_NUM_THREADS=4
$ pytest -n1 --pyargs scipy.linalg
19.06s
$ pytest -n4 --pyargs scipy.linalg
9.83s


$ export OMP_NUM_THREADS=1
$ pytest -n1 --pyargs scipy.linalg
19.85s
$ pytest -n4 --pyargs scipy.linalg
9.90s
$ pytest -n8 --pyargs scipy.linalg
9.25s

To verify that OMP_NUM_THREADS actually has an effect (also shows up in top with the process using >100% CPU):

$ python -c "import scipy; scipy.show_config()"
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/home/rgommers/anaconda3/envs/scipy-meson/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/home/rgommers/anaconda3/envs/scipy-meson/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/home/rgommers/anaconda3/envs/scipy-meson/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/home/rgommers/anaconda3/envs/scipy-meson/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]

$ export OMP_NUM_THREADS=4
$ python -m threadpoolctl -i numpy scipy.linalg
[
  {
    "filepath": "/home/rgommers/anaconda3/envs/scipy-meson/lib/libopenblasp-r0.3.12.so",
    "prefix": "libopenblas",
    "user_api": "blas",
    "internal_api": "openblas",
    "version": "0.3.12",
    "num_threads": 4,
    "threading_layer": "pthreads",
    "architecture": "SkylakeX"
  }
]
$ unset OMP_NUM_THREADS
$ python -m threadpoolctl -i numpy scipy.linalg
WARNING: could not import scipy.linalg
[
  {
    "filepath": "/home/rgommers/anaconda3/envs/scipy-meson/lib/libopenblasp-r0.3.12.so",
    "prefix": "libopenblas",
    "user_api": "blas",
    "internal_api": "openblas",
    "version": "0.3.12",
    "num_threads": 24,
    "threading_layer": "pthreads",
    "architecture": "SkylakeX"
  }
]

$ export OMP_NUM_THREADS=1
$ pytest -n8 --pyargs scipy
198.57s
$ export OMP_NUM_THREADS=4
$ pytest -n8 --pyargs scipy
198.28s
$ export OMP_NUM_THREADS=6
$ pytest -n4 --pyargs scipy
272.37s

$ export OMP_NUM_THREADS=1
$ pytest -n1 --pyargs scipy
714.68s
$ unset OMP_NUM_THREADS  # will use 24 threads
$ pytest -n1 --pyargs scipy
724.20s
$ pytest -nauto --pyargs scipy  # chooses 12 processes
579.97s
$ pytest --n3 --pyargs scipy 
435.93s
$ export OMP_NUM_THREADS=2
$ pytest -n6 --pyargs scipy
247.05s

Conclusions:

pytest-xdist scalability is really bad, it doesn’t help beyond -n 4 and may even have a negative effect on machines with a lot of cores.
Limiting the number of OpenMP threads for OpenBLAS/MKL doesn’t hurt, and can help significantly.

I briefly played with pytest-split as well, and it looks much more capable of scaling linearly, because there’s no communication overhead -i f you do a 10-fold split, it just runs 10% of the tests (balanced so total time for each split is about equal) in a process and reports the 10 results separately. The price you pay is doing test collection separately in all 10 processes.

0reactions

rgommerscommented, Jul 19, 2021

Thanks Thomas!

Top Results From Across the Web

Why does pytest-xdist make my tests run slower, not faster?

The big downside is that when I run with -n 4 , the test suite becomes slower than without the -n flag at...

Changelog — pytest documentation

#10060: When running with --pdb , TestCase.tearDown is no longer called for tests when the class has been skipped via unittest.skip or pytest.mark.skip...

test - Pants build

Given enough cores, Pants will be able to run all your tests at the same time. ... pytest-xdist already: Pants will run each...

pytest-cov - Read the Docs

Xdist support: you can use all of pytest-xdist's features and still get coverage. ... Do not report coverage if test run fails. Default:...

Load-balanced xdist - Ned Batchelder

I wrote a pytest plugin to evenly balance tests across xdist workers. ... runs slightly faster than before, but as is typical, not...