question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Simplify input requirement parsing

See original GitHub issue

What’s the problem this feature will solve?

Currently pip accepts several types of input as “requirements”:

  1. name-based requirements (PEP 508)
  2. direct references (PEP 440 - unsupported currently per #6202, but should be accepted)
  3. file paths (no PEP, just current behavior)
  4. URLs (no PEP, just current behavior)

The parsing for these is ad-hoc and pretty complicated, with lots of code paths (see here). This makes it hard to understand:

  1. the error that a user may see given some invalid input
  2. the possible initial states of InstallRequirement given a user input

It is also impossible to re-use the current code to initialize any other kind of type than an InstallRequirement (so this is a prereq for some of the build refactoring).

Describe the solution you’d like

At a high level we need to map any arbitrary input to one of the 4 categories mentioned above. This is difficult to do unambiguously because we accept file paths, so I think we should make some assumptions and then users that want to use weird file paths can feel free to use an explicit file:// URL.

The primary standards-based constraints are:

  1. a PEP 440 direct reference contains a @ followed by <scheme>:// followed optionally by ; and markers which can have any content
  2. a name-based requirement will consist of non-@ characters followed optionally by extras and specifiers and then by ; and markers which can have any content

Simplifying assumptions:

  1. A file path provided by the user must have at least one of ., /, or \ (on Windows), followed optionally by something that looks like extras and something that looks like markers
  2. The URI_reference part of a direct reference will contain ://
  3. URLs passed by the user will contain ://

That leads to the following rules for deciding how to process input:

  1. if the input contains “@” followed eventually by “😕/” (with no preceding “;”) then we treat it like a direct reference - pass it to Requirement and derive all fields of RequirementInfo from that
  2. If the input contains “😕/” (with no preceding “;”) then we treat it like a URL - we manually extract markers and optional package name and extras from and #egg= fragment, which are used to instantiate a Requirement if present. Any missing fields get derived from the Requirement if set.
  3. If the input contains os.pathsep or os.altsep or starts with ‘.’ then we treat it like a path, convert it to an absolute file URL and process the same as 2.
  4. Otherwise, we treat it like a name-based requirement - pass it to Requirement and derive all fields of RequirementInfo from that

Other details:

The module to be added is pip._internal.req.parsing with a function parse_requirement_text that takes a string as would be input by a user or in dependency metadata and returns a RequirementInfo. RequirementInfo would contain:

  1. markers: Set[Marker]
  2. link: Optional[Link] - if None then it’s a name-based requirement
  3. requirement: Optional[Requirement] - if None then it’s an “unnamed” requirement
  4. extras: Set[str]

parse_requirement_text would do the steps as described above.

parse_requirement_text would not do any filesystem operations or logging and it should map any expected exceptions to RequirementParsingError with an indication of how we were trying to process the text (direct reference, url, path, or name-based).

Once implemented, we should refactor req.constructors.install_req_from_* to delegate parsing to parse_requirement_text and just do operations on the returned RequirementInfo.

Alternative Solutions

  1. Refactor the existing code while preserving all possible existing behaviors. Having just tried that, it’s a big pain and the result doesn’t look very good.

Additional context

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
cjerdonekcommented, Sep 15, 2019

Also, in your write-up of the proposed rules, can you distinguish between choices that are forced by / follow logically from PEP’s, and rules that are more heuristics of your choosing? It seems like PR #6203 uses different heuristics (though I’m not certain). I think it would be helpful for people to know if / where there might be any ambiguity in interpreting and applying any of the PEP’s, and if we are making any choices here.

0reactions
cjerdonekcommented, Sep 16, 2019

A couple other things that would help in the description of the proposed rules (the “leads to the following rules” part of the original issue comment) are distinguishing between the parts encoding pip’s current behavior with the new logic being introduced. In other words, how much of this is new versus describing what pip already does. Something else that would help is to know if what’s being proposed is backwards compatible or what, if anything, might break for people.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Parsing Inputs - The Fuzzing Book
In this chapter, we use grammars to parse and decompose a given set of valid seed inputs into their corresponding derivation trees.
Read more >
Episode #181: Invertible Parsing: Generalization - Point-Free
We say roughly because parsing and printing can be lossy operations, but the underlying content of the input or output should be unchanged....
Read more >
Proper way to parse requirements file after pip upgrade to pip ...
First, I believe parsing requirements.txt from within setup.py is not a good idea. It should be the other way around, install_requires in ...
Read more >
A Guide To Parsing: Algorithms And Terminology
An in-depth coverage of parsing terminology an issues, together with an explanation for each one of the major algorithms and when to use...
Read more >
Simplifying parser generation - ACM Digital Library
The parsing of text input is an essential concept for computer science ... requirement that only a maximum-length match is permitted for a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found