question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pip resolver should prefer cause of conflicts when backtracking

See original GitHub issue

What’s the problem this feature will solve?

This can drastically improves the performance of real world dependency conflicts where pip needs to backtrack. Specifically it fixes https://github.com/pypa/pip/issues/10201

Describe the solution you’d like

When you have dependencies on packages A, B, C, and both A and B depend on X, but latest versions of A and B depend on mutually exclusive versions of X, pip should prefer resolving A and B, pip should not prefer trying to resolve C. This is for 2 reasons:

  1. In the real world in general if you have some package Foo version n and it depends on some package Bar then Foo version n-1 is likely to also depend on some package Bar. Therefore in the above example it makes sense to focus on A and B as they need to resolve what version of X they both mutually agree on
  2. It is intuitive to end users that packages which are causing the conflict are the ones pip should try to resolve, not some package which is not part of the current conflict

Alternative Solutions

There are probably clever graph theory / dependency tree techniques that can improve general performance here. This however is a very small change that only slightly alters the behavior of get_preference.

Additional context

I will submit PRs based on the following diff: https://github.com/notatallshaw/pip/compare/21.2.3...notatallshaw:third_attempt_at_prefer_non_conflicts

However I created this issue to convince pip maintainers first, as there is only a limited amount of evidence I can give and it is based on real world reports (i.e. anecdotal reports, I have about 9 reproducible examples from people reporting issues to Pip’s github, if you have more or know where I can find more examples please let me know). In particular I do not have any test cases because:

  1. There are no existing unit tests for get_preference
  2. As best as I can tell there are no existing functional tests which infer the behavior of get_preference
  3. As best as I can tell there are no existing performance tests along the lines of “given this dependency tree how many times does pip have to backtrack”

So given that let me explain what limitations I think there are to this approach:

Real World Limitations

Of my testing the biggest limitation I found was if the pip resolver has already pinned one of the failing causes long before the failure happens, this results in the resolver backtracking for a long period of time

This can be shown with the requirement apache-airflow[all]==1.10.13, where one of the causes of the causing failures is moto. However moto is pinned by the pip resolver very early on and therefore will continue to be pinned for a long time before it gets unpinned, therefore alternative versions of moto can not be explored until pip spends a long time backtracking.

This situation is no worse than the current resolver, and I actually think this modification will make this situation orders of magnitude faster than the current resolver (but this might be billions of years to resolve instead of heat deaths of the universe time frame).

Theoretical limitations

Fundamentally this change is just to get_preference and therefore the order of package choice in resolving when backtracking, so it will be possible to construct a dependency tree that will be slower under this change than the current resolver.

I have thought of a possible real world scenario where this might happen: You require packages A and B. A has a complex dependency tree, and between A version n-1 and A version n that dependency tree gets completely changed. B is completely incompatible with the deeper dependencies of A version n and we must backtrack to A version n-1 to find a solution.

Through luck the current resolver might backtrack on the right path and resolve quickly, Where as in this case focusing on the failures between B and A’s dependencies might cause a long backtracking to happen as focusing on the failures here is a red herring and you need to backtrack all the way to A version n-1.

Though I have not found any real world examples of this scenario where focusing on the failures is a red herring, I am sure with enough time and Python projects someone will eventually find an example.

Code of Conduct

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:10
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

2reactions
notatallshawcommented, Sep 16, 2021

A further limitation of this change is it takes away some of the power of user ordering when backtracking. If users perfectly construct the order of their requirements file this approach will partially disrupt that when backtracking.

Though user ordering feature is largely undocumented and I suspect you would be hard pressed to 1) find anyone actually using this feature and even if you did 2) find a situation where there is a complex backtracking problem and for a user to be able to solve it themselves via order changing.

2reactions
notatallshawcommented, Sep 16, 2021

Okay here are the 3 pull requests that make up the version of the resolver I have been testing:

Please let me know what you think and if there’s anything more I can do to convince you that this is a good solution.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dependency Resolution - pip documentation v22.3.1
When pip finds that an assumption it made earlier is incorrect, it has to backtrack, which means also discarding some of the work...
Read more >
Resolving new pip backtracking runtime issue
The behavior seen is described as backtracking in the release notes. I understand why it is there. It specifies that I can use...
Read more >
Understanding Python Packages pip Dependency Resolver ...
1 # Depends on click>=8.0 - CONFLICT! pip will again cause a ResolutionImpossible error (same as before) and fail the installation: $ pip ......
Read more >
User Guide - pip documentation v21.1.dev0
Assuming that you cannot resolve the conflict by loosening the version of the package you require (as above), you can try to fix...
Read more >
TIL: pip-tools Supports pyproject.toml
That means that you can use pip-compile together with project ... By passing both, this causes a dependency conflict and pip will fail...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found