Consider using BFS instead of DFS for backtracking
See original GitHub issueWhat’s the problem this feature will solve?
Current pip will drive version of one of the packages to the ground looking for compatible one, then it reduces the version of a second package a little and drives the first package to the ground again, i.e. performing kind of a depth-first search (DFS). This approach has a few issues:
- It can be very slow if one of the packages has a lot of versions available.
- If dependencies of very old version of a package are somehow not correctly set, pip will choose and install that very old version that will not actually work with the dependent package of which a fairly new version is installed.
- Versions of installed packages may vary a lot depending on the order these packages are listed in the
pip install
command.
Example of the behavior is:
# pip3 install flake8 hacking
Successfully installed flake8-4.0.1 hacking-0.5.4
Here it downloads a huge pile of different versions of hacking from 4.1 down to 0.5 taking a lot of time and also ending up with a broken combination. To be clear, I’m not blaming pip for incompatibility of the result, this is, probably, the issue with the packaging of that very old version of hacking
. However, time is wasted and the result is a broken flake8 command.
While:
# pip3 install hacking flake8
Successfully installed flake8-3.8.4 hacking-4.1.0
Here we can see that pip started downgrading from the flake8 package and found a working combination almost immediately with a reasonably fresh versions of both packages.
Describe the solution you’d like
Solution might be to use breadth-first search (BFS) instead, i.e. not driving one package to the ground, but gradually lower versions of all f them one at a time until the good combination is found.
So, for the flake8
and hacking
problem, solution can be found quickly regardless of the order in which these packages are in the command line. This search order should also provide better compatibility between installed packages, because we should end up
with more or less fresh versions of all of them. And this should help avoiding badly packaged very old versions of some software.
Alternative Solutions
Workaround is to manually limit versions of some packages or shuffle them around in the command line until pip
installs what you need. But that’s no fun to do if you’re not a developer of these packages and hardly know what the version requirements should be. And this gets even worse if you need a big number of packages installed and which of them is actually causing the backtracking problem is unclear.
Additional context
Along with aborted backtracking problem we had to spend a significant amount of time – while trying to fix CI for the openvswitch
project – trying to figure out why flake8
checks are not performed during the build. Ended up with >=3.0
limit for the hacking
package, otherwise pip installs hacking-0.5.4
and flake8
just throws an exception on startup. At this point our configuration script decides that flake8
is broken and unavailable: https://github.com/openvswitch/ovs/commit/d5453008c419512ba5a31dade5d394984b6161a1
With the version limit, pip does this:
# pip3 install flake8 'hacking>=3'
Successfully installed flake8-3.9.2 hacking-3.0.0
Code of Conduct
- I agree to follow the PSF Code of Conduct.
Issue Analytics
- State:
- Created 2 years ago
- Comments:12 (10 by maintainers)
Top GitHub Comments
I think it’s fairly unlikely that we’d make such a change in pip, especially given that we don’t have same amount of available time for change management and developer time, as we did when we rolled out the resolver.
As noted already, there are specific advantages to DFS (notably, that it’s easier to control as an end user as well as being easier to reason about as an end user). Switching to BFS does not solve all the issues, it just changes the set of tradeoffs involved and it’s not universally better either.
Further, pip also does a first round of breadth first-style sweep of the top-level requirements, to allow users to have the control that they wish to have on the resolution process – so, we do already have some of the top-level benefits.
If folks wish to implement alternative dependency resolution algorithms and feed that into pip, that can be done today itself, by spewing out a requirements.txt file that’s installed with
--no-deps
with pip. That will not use pip’s dependency resolver and will install the packages exactly as requested.That definitely feels intuitive, however in real world dependency trees I am not convinced this would be true because the version depth of one package can not be compared to the version depth of another package.
Take for example
boto3
which releases every day, you may want to go back 10 to 100 versions quite reasonably because of some requirements in your dependency tree, this is just a few weeks or months old. Now compare topymssql
where 10 versions back would take you ~6 years and 20 versions back would take you ~18 years. In a hypothetical conflict between the two a BFS would quickly go to an ancient version ofpymssql
.I think the only real way to protect yourself against ancient versions of packages got stability is to put in lower bounds either in your requirements file or constraints file. Pip can’t know ahead of time if the metadata will fail to build on your machine or not. I believe one intention, beyond the primary reason of catching missing system dependencies, that erroring out when a package fails to build metadata is to reveal backtracking too far issues and encourage you, and the ecosystem in general, to put in sensible lower bounds.
But I am not a Pip maintainer, I have just been working a little bit on improving Pips backtracking performance. You should know this project is volunteer driven and ultimately changes are made by people committing PRs so they can be evaluated in practice.