Only pull in files from --extra-index-url if the package exists on --index-url
See original GitHub issueWhat’s the problem this feature will solve?
I was thinking if there’s a middle ground for #8606 between change nothing (behaviourally, not considering user education issues) and straight out removing --extra-index-url
(which most seem to disagree with).
From what I can tell, there are two main usages of --extra-index-url
:
- To serve files supplementing packages publishes the main index. This is the intended usage (and what piwheels is doing).
- To serve packages not available on the main index. This is what we don’t want people to do since it is suspect to supply chain attacks (where the “attacker” may be the users themselves).
So I considered the difference between the two usages and tried to come up with a strategy that keeps the first use case working unchanged, but makes the second fail early so people are directed to other solutions before thinge become dangerous.
Describe the solution you’d like
Considering the most popular scenario where --index-url
is set to pypi.org. For pip install mypackage --extra-index-url=https://myindex
, files listed on https://myindex/mypackage/
is collected by pip only if https://pypi.org/mypackage/
returns a non-error response with at least one file on the page. In other words, the main index also acts as pip’s “canonical package name registry”, and extra indexes may only add files under names provided by the registry, and cannot register new project names. I think this should cover most legistimate --extra-index-url
usages (I think).
This will also break most of the inappropriate usages, since those private packages are generally not available on PyPI and won’t be picked up by pip (pip will need to emit a warning saying the project is found but ignored so they don’t think this is a bug). The “natural” response to these people would be to swap the indexes (errornously thinking the index order is significant), and that would still break because their private index does not serve PyPI projects. The only way to correctly make their private project installable is to point --index-url
to an index that contains both PyPI and their private projects—which is our recommended best practice.
The only variant not solved by this design would be if mypackage
does exist on PyPI, and someone relies on https://myindex
to provide a newer version than PyPI. But people doing this are very likely already fully aware of the possibilities a newer mypackage
on PyPI will break the workflow (and willing to take the risk). They think they know what they’re doing, so I say whatever, let’s not stop them.
Additional context
Question for @bennuttall: Does piwheels currently serve projects that does not exist on PyPI? This design would keep everything working if it does not. If it does, would it be difficult to publish them also to PyPI?
Issue Analytics
- State:
- Created 3 years ago
- Comments:16 (14 by maintainers)
As you know, this change will disrupt people who are depending on private packages from internal indexes on the
extra-index-url
, and also depending on packages from the public pypi on theindex-url
. I was wondering what the timeline looks like for implementing this change? GitLab is planning a change which will mitigate the aforementioned problem (https://gitlab.com/gitlab-org/gitlab/-/issues/233413), and I’m hoping the change in pip will come about after GitLab gets their change in.Yes. The public interface is organised well, but it’s pretty difficult to change what’s going on in the implementation, namely
LinkCollector.collect_links()
. Unfortunately this change must do that, since the current implementation throws away index information eagerly, making it impossible to track what links are obtained from the “main” index URL.