Performance idea: `--sf`/`--slow-first` option to improve resource utilization
See original GitHub issueReading this blog post about Stripe’s test runner made me think we should have a --slow-first
option for xdist, and it seems that we don’t yet 😅 The motivation for --slow-first
is that fastest-tests-last is a great heuristic to reduce the duration at the end of a test run when some processes are done but others are still running - which can range from negligible to “several times longer than the rest of the run” (when e.g. I select mostly fast unit tests, plus a few slow integration tests which happen to run last).
IMO this should be lower priority than --last-failed
, only reorder passing tests for --failed-first
, and be incompatible with --new-first
(existing flag docs). The main trick is to cache durations from the last run, and then order by the aggregate time for each loadscope (i.e. method, class, or file, depending on what we’ll distribute - pytest-randomly is useful prior art).
Issue Analytics
- State:
- Created 2 years ago
- Reactions:7
- Comments:7 (5 by maintainers)
Top GitHub Comments
It could be implemented elsewhere, but the
--slow-first
ordering only improves performance if you’re running tests in parallel, so I think xdist is the most sensible place for it. It could even make single-core performance worse, e.g. in combination with-x
/--exit-first
. For best results--slow-first
also needs to know the current value of xdist’s--dist
argument.For example, take a test suite with five 1s tests in file A, a single 3s test in file B, and two 3s tests in file C; and assume that we have two cores.
--dist=load
A1 A3 A5 C1
=6s and core2 runA2 A4 B C2
=8s--slow-first
would have core1 runB C2 A4
=7s and core2 runC1 A1 A2 A3 A5
=7s (speedup!)--dist=loadfile
A
=5s and core2 runB C
=9s--slow-first
would have core1 runC
=6s and core2 runA B
=8s (speedup!)--dist=each
is of course equivalent to single-core, so no benefit from--slow-first
So on this toy model we get a 16% wall-clock speedup just from better task ordering!
In the real world, I have twelve cores and Hypothesis’ 2500
cover
tests take ~70s with the slowest ten tests taking 5-15s each; the 500nocover
tests take ~35s with the slowest ten taking 5-19s each (and yes we’ve taken the low-hanging perf fruit). Anecdotally, it’s pretty obvious towards the end that things are slowing down and a few cores are idling, and I’d expect a similar 10%-20% wall-clock improvement.Awesome @klimkin, thanks for sharing!