Simplify input requirement parsing
See original GitHub issueWhat’s the problem this feature will solve?
Currently pip accepts several types of input as “requirements”:
- name-based requirements (PEP 508)
- direct references (PEP 440 - unsupported currently per #6202, but should be accepted)
- file paths (no PEP, just current behavior)
- URLs (no PEP, just current behavior)
The parsing for these is ad-hoc and pretty complicated, with lots of code paths (see here). This makes it hard to understand:
- the error that a user may see given some invalid input
- the possible initial states of
InstallRequirement
given a user input
It is also impossible to re-use the current code to initialize any other kind of type than an InstallRequirement
(so this is a prereq for some of the build refactoring).
Describe the solution you’d like
At a high level we need to map any arbitrary input to one of the 4 categories mentioned above. This is difficult to do unambiguously because we accept file paths, so I think we should make some assumptions and then users that want to use weird file paths can feel free to use an explicit file://
URL.
The primary standards-based constraints are:
- a PEP 440 direct reference contains a
@
followed by<scheme>://
followed optionally by;
and markers which can have any content - a name-based requirement will consist of non-
@
characters followed optionally by extras and specifiers and then by;
and markers which can have any content
Simplifying assumptions:
- A file path provided by the user must have at least one of
.
,/
, or\
(on Windows), followed optionally by something that looks like extras and something that looks like markers - The URI_reference part of a direct reference will contain
://
- URLs passed by the user will contain
://
That leads to the following rules for deciding how to process input:
- if the input contains “@” followed eventually by “😕/” (with no preceding “;”) then we treat it like a direct reference - pass it to
Requirement
and derive all fields ofRequirementInfo
from that - If the input contains “😕/” (with no preceding “;”) then we treat it like a URL - we manually extract markers and optional package name and extras from and
#egg=
fragment, which are used to instantiate aRequirement
if present. Any missing fields get derived from theRequirement
if set. - If the input contains
os.pathsep
oros.altsep
or starts with ‘.’ then we treat it like a path, convert it to an absolute file URL and process the same as 2. - Otherwise, we treat it like a name-based requirement - pass it to
Requirement
and derive all fields ofRequirementInfo
from that
Other details:
The module to be added is pip._internal.req.parsing
with a function parse_requirement_text
that takes a string as would be input by a user or in dependency metadata and returns a RequirementInfo
. RequirementInfo
would contain:
markers: Set[Marker]
link: Optional[Link]
- ifNone
then it’s a name-based requirementrequirement: Optional[Requirement]
- ifNone
then it’s an “unnamed” requirementextras: Set[str]
parse_requirement_text
would do the steps as described above.
parse_requirement_text
would not do any filesystem operations or logging and it should map any expected exceptions to RequirementParsingError
with an indication of how we were trying to process the text (direct reference, url, path, or name-based).
Once implemented, we should refactor req.constructors.install_req_from_*
to delegate parsing to parse_requirement_text
and just do operations on the returned RequirementInfo
.
Alternative Solutions
- Refactor the existing code while preserving all possible existing behaviors. Having just tried that, it’s a big pain and the result doesn’t look very good.
Additional context
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (10 by maintainers)
Top GitHub Comments
Also, in your write-up of the proposed rules, can you distinguish between choices that are forced by / follow logically from PEP’s, and rules that are more heuristics of your choosing? It seems like PR #6203 uses different heuristics (though I’m not certain). I think it would be helpful for people to know if / where there might be any ambiguity in interpreting and applying any of the PEP’s, and if we are making any choices here.
A couple other things that would help in the description of the proposed rules (the “leads to the following rules” part of the original issue comment) are distinguishing between the parts encoding pip’s current behavior with the new logic being introduced. In other words, how much of this is new versus describing what pip already does. Something else that would help is to know if what’s being proposed is backwards compatible or what, if anything, might break for people.