Refactor parser structure to match CPython's grammar more closely
See original GitHub issueThe parse function structure of the parser implementation in Cython/Compiler/Parsing.py
has diverged from the old Grammar in CPython and certainly does not match the new PEG parser. Additionally, several flags were added over time that make it less clear what kind of expression is allowed and supposed to be parsed where. This ticket asks to
- adapt the parse function split and their names to what CPython uses in its own parser, as closely as possible (but keeping the
p_
prefix for readability) - remove flag options where possible and reasonable
- split parse functions that currently take options into separate functions that parse different things, and use them in the appropriate places.
Basically, it should be clear from the name of a called parse function in which state the parser now changes and what it is allowed to see next. This state should not depend on additional options (“parse X next, unless I’m telling you not to do what I’m asking you to do”).
This can (and should best) be done in multiple iterations, both to keep the changes easy to review and to allow us to see where we are going along the way.
Known fields that require a cleanup are
-
p_test()
as entry point for expressions - the integration of lambda expressions
- star expressions
- conditional expressions
- named expressions (walrus operator)
Along the way, the following missing syntax features can be added:
- parenthesised context managers (https://docs.python.org/3/whatsnew/3.10.html#new-features)
- general expressions as decorators (#4570, PEP 614)
The Python test suite has tests for them.
CPython’s old parser grammar: https://github.com/python/cpython/blob/3.9/Grammar/Grammar CPython’s new PEG parser grammar: https://github.com/python/cpython/blob/main/Grammar/python.gram
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:29 (18 by maintainers)
Top GitHub Comments
The parser rarely changes compared to all the rest and is mostly tested through the bulk of feature file tests, which, luckily for us, test most of the features that programmers use in their code. But since we’re dealing with a programming language here, specifically one that borrows from three different languages, it’s difficult to even get close to testing all syntactic combinations that are relevant for the parser. It’s not just syntax constructs, there’s often also context involved.
That said, many of the compile tests target mostly the parser.
It isn’t hugely important. It’s just a bit of code that’s been working well without many changes for a long time