Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Simplify input requirement parsing

See original GitHub issue

What’s the problem this feature will solve?

Currently pip accepts several types of input as “requirements”:

name-based requirements (PEP 508)
direct references (PEP 440 - unsupported currently per #6202, but should be accepted)
file paths (no PEP, just current behavior)
URLs (no PEP, just current behavior)

The parsing for these is ad-hoc and pretty complicated, with lots of code paths (see here). This makes it hard to understand:

the error that a user may see given some invalid input
the possible initial states of InstallRequirement given a user input

It is also impossible to re-use the current code to initialize any other kind of type than an InstallRequirement (so this is a prereq for some of the build refactoring).

Describe the solution you’d like

At a high level we need to map any arbitrary input to one of the 4 categories mentioned above. This is difficult to do unambiguously because we accept file paths, so I think we should make some assumptions and then users that want to use weird file paths can feel free to use an explicit file:// URL.

The primary standards-based constraints are:

a PEP 440 direct reference contains a @ followed by <scheme>:// followed optionally by ; and markers which can have any content
a name-based requirement will consist of non-@ characters followed optionally by extras and specifiers and then by ; and markers which can have any content

Simplifying assumptions:

A file path provided by the user must have at least one of ., /, or \ (on Windows), followed optionally by something that looks like extras and something that looks like markers
The URI_reference part of a direct reference will contain ://
URLs passed by the user will contain ://

That leads to the following rules for deciding how to process input:

if the input contains “@” followed eventually by “😕/” (with no preceding “;”) then we treat it like a direct reference - pass it to Requirement and derive all fields of RequirementInfo from that
If the input contains “😕/” (with no preceding “;”) then we treat it like a URL - we manually extract markers and optional package name and extras from and #egg= fragment, which are used to instantiate a Requirement if present. Any missing fields get derived from the Requirement if set.
If the input contains os.pathsep or os.altsep or starts with ‘.’ then we treat it like a path, convert it to an absolute file URL and process the same as 2.
Otherwise, we treat it like a name-based requirement - pass it to Requirement and derive all fields of RequirementInfo from that

Other details:

The module to be added is pip._internal.req.parsing with a function parse_requirement_text that takes a string as would be input by a user or in dependency metadata and returns a RequirementInfo. RequirementInfo would contain:

markers: Set[Marker]
link: Optional[Link] - if None then it’s a name-based requirement
requirement: Optional[Requirement] - if None then it’s an “unnamed” requirement
extras: Set[str]

parse_requirement_text would do the steps as described above.

parse_requirement_text would not do any filesystem operations or logging and it should map any expected exceptions to RequirementParsingError with an indication of how we were trying to process the text (direct reference, url, path, or name-based).

Once implemented, we should refactor req.constructors.install_req_from_* to delegate parsing to parse_requirement_text and just do operations on the returned RequirementInfo.

Alternative Solutions

Refactor the existing code while preserving all possible existing behaviors. Having just tried that, it’s a big pain and the result doesn’t look very good.

Additional context

#5204

Issue Analytics

State:
Created 4 years ago
Comments:10 (10 by maintainers)

Top GitHub Comments

1reaction

cjerdonekcommented, Sep 15, 2019

Also, in your write-up of the proposed rules, can you distinguish between choices that are forced by / follow logically from PEP’s, and rules that are more heuristics of your choosing? It seems like PR #6203 uses different heuristics (though I’m not certain). I think it would be helpful for people to know if / where there might be any ambiguity in interpreting and applying any of the PEP’s, and if we are making any choices here.

0reactions

cjerdonekcommented, Sep 16, 2019

A couple other things that would help in the description of the proposed rules (the “leads to the following rules” part of the original issue comment) are distinguishing between the parts encoding pip’s current behavior with the new logic being introduced. In other words, how much of this is new versus describing what pip already does. Something else that would help is to know if what’s being proposed is backwards compatible or what, if anything, might break for people.