Dependency resolution differences (wrong) when using custom (i.e. not pypi) repository
See original GitHub issue- I am on the latest Poetry version.
- I have searched the issues of this repo and believe that this is not a duplicate.
- If an exception occurs when executing a command, I executed it again in debug mode (
-vvv
option).
- OS version and name: Windows 10 and Ubuntu 20.04
- Poetry version: 1.1.8
- Link of a Gist with the contents of your pyproject.toml file: not needed to explain
Issue
Any source that is defined in pyproject.toml that is not pypi, is always handled internally as a LegacyRepository.
That means metadata is not collected from API calls, but always by downloading and parsing packages, usually sdists.
I probably don’t have to explain how this is bad for performance in terms of speed, but you can see people notice it because it is quite significant! See for example #4113
Sometimes however, in cases where package metadata would have been available on an API endpoint, but poetry can’t figure out what the metadata is by parsing the sdist, this leads to problems in dependency resolution.
For example, scikit-image 0.17.2 sdist imports numpy in its setup.py, but it doesn’t specify any build requirements in pyproject.toml, so running setup.py fails. Poetry then just silently concludes scikit-image doesn’t have any dependencies, which is clearly wrong.
This is exactly what happens in #3464 and is also how I first encountered this bug.
If you install this package from pypi however, everything goes smoothly because the metadata is collected from the API endpoint instead.
So in short, for the exact same dependencies, depending on what source repository you use: pypi or something else, you may not get the same dependency resolution. Even if the alternative source is a direct reverse proxy to pypi.
Suggested fix
Option 1 - Fully automated
This is the ideal option. Poetry becomes clever enough to figure out for any source if it can provide metadata via an API just like pypi can. A mechanism needs to be built that tests this per configured source.
You could look at hostnames to try and optimize this guessing game a little bit.
Option 2 - User configurable
Allow users to configure the capabilities a source has available in pyproject.toml. This would basically put the responsibility with the user to tell poetry what APIs can be consumed.
[[tool.poetry.source]]
name = "foo"
url = "https://foo.bar/simple/"
capabilities = { foo = True, bar = True }
If you agree with one of the suggested improvements, I can do the work and open a PR. I’m pretty sure many users will reap the benefits in performance and correctness!
Issue Analytics
- State:
- Created 2 years ago
- Reactions:5
- Comments:6 (3 by maintainers)
Tip for others that are also affected by this. Until this is resolved we are moving to installing via git instead
Using private pypi:
Installing with private pypi can take hours for us.
I had read through the code when I had created my issue and my understanding of the cause matches yours. I think, though, that assuming that all PyPI-like backends support the simple API would be brave. For example, I was using AWS CodeArtifact as one of my PyPI backends, and that supports the legacy API and not the simple one.
I know that it is far from ideal, but it’s probably best to just try every front door for custom repositories and see which APIs are available to use, rather than just resorting to sdist downloads for all custom repositories. It’s a sad state of affairs when the thing storing packages can’t be trusted to answer really basic questions about what packages need to be installed correctly, but these performance and correctness issues really undercut a huge amount of the value add that users get from using Poetry. I appreciate that Poetry is trying to do the “right thing”, but it’s tiring to advocate for using tools like this and have it either take ages to do its calculations, especially when it’s not making use of available API endpoints to do so.