New resolver: Build automated testing to check for acceptable performance
See original GitHub issueOur new dependency resolver may make pip a bit slower than it used to be.
Therefore I believe we need to pull together some extremely rough speed tests and decide what level of speed is acceptable, then build some automated testing to check whether we are meeting those marks.
I just ran a few local tests (on a not-particularly-souped-up laptop) to do a side-by-side comparison:
$ time pip install --upgrade pip
Requirement already up-to-date: pip in [path]/python3.7/site-packages (20.2)
real 0m0.867s
user 0m0.680s
sys 0m0.076s
$ time pip install --upgrade pip --use-feature=2020-resolver
Requirement already satisfied: pip in [path]/python3.7/site-packages (20.2)
real 0m1.243s
user 0m0.897s
sys 0m0.060s
Or, in 2 different virtualenvs:
$ time pip install --upgrade chardet
Requirement already up-to-date: chardet in [path].virtualenvs/form990/lib/python3.7/site-packages (3.0.4)
real 0m0.616s
user 0m0.412s
sys 0m0.053s
$ time pip install --upgrade chardet --use-feature=2020-resolver
Requirement already satisfied: chardet in [path].virtualenvs/ical3/lib/python3.7/site-packages (3.0.4)
real 0m1.137s
user 0m0.404s
sys 0m0.053s
These numbers will add up with more complicated processes, dealing with lots of packages at a time.
Edit by @brainwane: As of November 2020 we have defined some speed goals and the new resolver has acceptable performance, so I’ve switched this issue to be about building automated testing to ensure that we continue to meet our goals in the future.
Edit by @uranusjr: Some explanation for people landing here. The new resolver is generally slower because it checks the dependency between packages more rigorously, and tries to find alternative solutions when dependency specifications do not meet. The legacy resolver, on the other hand, just picks the one specification it likes best without verifying, which of course is faster but also irresponsible.
Feel free to post examples here if the new resolver runs slowly for your project. We are very interested in reviewing all of them to identify possible improvements. When doing so, however, please make sure to also include the pip install
output, not just your requirements.txt
. The output is important for us to identify what pip is spending time for, and suggest workarounds if possible.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:5
- Comments:51 (28 by maintainers)
Thanks to @pfmoore and @uranusjr’s amazingness, we have #8912 and #8932 which should significantly improve optimize how often we hit the network.
I took wayyy too long to finish doing this, but… here are numbers for a few runs, comparing the legacy resolver vs the 2020 resolver in 20.2.3 vs after-#8932. All the requirements used for these tests are from reports we’ve seen from our users (yay feedback!) and are included below.
In the listing below, “cold” is a
pip install -r ...
run in a clean virtualenv. “warm” is apip install -r ...
run in an already populated virtualenv (basically the state after the “cold” run). All runs are after populating the cache with all the relevant files, to reduce the overhead of downloading the distribution files.It’s relatively straightforward to pull together more numbers here, but I think these paint a fairly reasonable “broad strokes” picture. Let me know if someone thinks we need more information here. 😃
With #8932, things improve substantially for “warm” states, with a minor degradation in some situations for the “cold” case. Looking at the way the resolver is exploring the graph, I think we’re doing OK. The part of this change that we had most feedback on – the “warm” case – should be fixed pretty soon. It’s worth investigating why #8932 makes the “cold” cases slower though.
Once #8932 is merged, I think we’ll be at an acceptable point.
We are expecting some amount of degradation due to actually being strict, and looking at what the resolver is doing that seems to be the case here, so I think we’re fine. FWIW, running with --no-deps isn’t going to result in any speedups (since we’re still verifying the choices made).
Beyond that, the only think that may be worth exploring is why #8932 right now isn’t as fast as the 20.2.3 resolver in the cold case. I’ll also point out that #8932 is open right now, so it’s likely @uranusjr or I or @pfmoore would look into this before that merges. I don’t think it’s a big enough deviation to block the release but I’m all ears for differing opinions. 😃
The input files, scripts and intermediate-output involved
Manually formatted text, w/ a text editor and a throw-away script:
run.sh:
one.txt:
two.txt:
three.txt:
Spending some more time to debug this… pip’s new resolver is hitting the network even when the currently installed version does satisfy the version requested. Further, it’s also hitting the same index page (i.e.
https://pypi.org/simple/{project}
) each time we see it during the graph exploration, which is obviously the wrong thing to do.That’s 100% a genuine bug, and I’ll file a new issue for tracking that.