question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Consider using BFS instead of DFS for backtracking

See original GitHub issue

What’s the problem this feature will solve?

Current pip will drive version of one of the packages to the ground looking for compatible one, then it reduces the version of a second package a little and drives the first package to the ground again, i.e. performing kind of a depth-first search (DFS). This approach has a few issues:

  1. It can be very slow if one of the packages has a lot of versions available.
  2. If dependencies of very old version of a package are somehow not correctly set, pip will choose and install that very old version that will not actually work with the dependent package of which a fairly new version is installed.
  3. Versions of installed packages may vary a lot depending on the order these packages are listed in the pip install command.

Example of the behavior is:

# pip3 install flake8 hacking
Successfully installed flake8-4.0.1 hacking-0.5.4

Here it downloads a huge pile of different versions of hacking from 4.1 down to 0.5 taking a lot of time and also ending up with a broken combination. To be clear, I’m not blaming pip for incompatibility of the result, this is, probably, the issue with the packaging of that very old version of hacking. However, time is wasted and the result is a broken flake8 command.

While:

# pip3 install hacking flake8
Successfully installed flake8-3.8.4 hacking-4.1.0

Here we can see that pip started downgrading from the flake8 package and found a working combination almost immediately with a reasonably fresh versions of both packages.

Describe the solution you’d like

Solution might be to use breadth-first search (BFS) instead, i.e. not driving one package to the ground, but gradually lower versions of all f them one at a time until the good combination is found.

So, for the flake8 and hacking problem, solution can be found quickly regardless of the order in which these packages are in the command line. This search order should also provide better compatibility between installed packages, because we should end up with more or less fresh versions of all of them. And this should help avoiding badly packaged very old versions of some software.

Alternative Solutions

Workaround is to manually limit versions of some packages or shuffle them around in the command line until pip installs what you need. But that’s no fun to do if you’re not a developer of these packages and hardly know what the version requirements should be. And this gets even worse if you need a big number of packages installed and which of them is actually causing the backtracking problem is unclear.

Additional context

Along with aborted backtracking problem we had to spend a significant amount of time – while trying to fix CI for the openvswitch project – trying to figure out why flake8 checks are not performed during the build. Ended up with >=3.0 limit for the hacking package, otherwise pip installs hacking-0.5.4 and flake8 just throws an exception on startup. At this point our configuration script decides that flake8 is broken and unavailable: https://github.com/openvswitch/ovs/commit/d5453008c419512ba5a31dade5d394984b6161a1

With the version limit, pip does this:

# pip3 install flake8 'hacking>=3'
Successfully installed flake8-3.9.2 hacking-3.0.0

Code of Conduct

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:12 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
pradyunsgcommented, Feb 18, 2022

I think it’s fairly unlikely that we’d make such a change in pip, especially given that we don’t have same amount of available time for change management and developer time, as we did when we rolled out the resolver.

As noted already, there are specific advantages to DFS (notably, that it’s easier to control as an end user as well as being easier to reason about as an end user). Switching to BFS does not solve all the issues, it just changes the set of tradeoffs involved and it’s not universally better either.

Further, pip also does a first round of breadth first-style sweep of the top-level requirements, to allow users to have the control that they wish to have on the resolution process – so, we do already have some of the top-level benefits.

If folks wish to implement alternative dependency resolution algorithms and feed that into pip, that can be done today itself, by spewing out a requirements.txt file that’s installed with --no-deps with pip. That will not use pip’s dependency resolver and will install the packages exactly as requested.

1reaction
notatallshawcommented, Feb 7, 2022

I’m able to install flake8 and hacking with 22.0.3 just fine even though it takes a few seconds (I only had to install wheel beforehand). And since you mentioned that, it is another problem of DFS and, likely, backjumping too: good chance to have a build failure or some other error while trying to build very old version of a package. Current pip will abort backtracking at this point leaving the user with no packages installed at all. BFS, I think, should be better protected from such failures, since it will not go that deep in most cases. Just my 2c.

That definitely feels intuitive, however in real world dependency trees I am not convinced this would be true because the version depth of one package can not be compared to the version depth of another package.

Take for example boto3 which releases every day, you may want to go back 10 to 100 versions quite reasonably because of some requirements in your dependency tree, this is just a few weeks or months old. Now compare to pymssql where 10 versions back would take you ~6 years and 20 versions back would take you ~18 years. In a hypothetical conflict between the two a BFS would quickly go to an ancient version of pymssql.

I think the only real way to protect yourself against ancient versions of packages got stability is to put in lower bounds either in your requirements file or constraints file. Pip can’t know ahead of time if the metadata will fail to build on your machine or not. I believe one intention, beyond the primary reason of catching missing system dependencies, that erroring out when a package fails to build metadata is to reveal backtracking too far issues and encourage you, and the ecosystem in general, to put in sensible lower bounds.

But I am not a Pip maintainer, I have just been working a little bit on improving Pips backtracking performance. You should know this project is volunteer driven and ultimately changes are made by people committing PRs so they can be evaluated in practice.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Explain BFS and DFS in terms of backtracking - Stack Overflow
Depth-first search (DFS) is an algorithm for traversing or searching a tree, tree structure, or graph. One starts at the root (selecting some ......
Read more >
When should we use BFS instead of DFS, and vice versa?
DFS is a recursive algorithm whereas BFS is an iterative one and is implemented using a queue..Although you can implement DFS using a...
Read more >
Difference between BFS and DFS - GeeksforGeeks
DFS stands for Depth First Search. 2. ... BFS(Breadth First Search) uses Queue data structure for finding the shortest path. DFS(Depth First ...
Read more >
Depth-First Search (DFS) vs Breadth-First Search (BFS)
This post will cover the difference between the Depth–first search (DFS) and Breadth–first search (BFS) algorithm used to traverse/search tree or graph data ......
Read more >
Backtracking vs. Depth-First Search - Baeldung
We can use the depth-first search to solve puzzles with only one solution or to find connected components. It's also a default algorithm...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found