Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Refactor parser structure to match CPython's grammar more closely

See original GitHub issue

The parse function structure of the parser implementation in Cython/Compiler/Parsing.py has diverged from the old Grammar in CPython and certainly does not match the new PEG parser. Additionally, several flags were added over time that make it less clear what kind of expression is allowed and supposed to be parsed where. This ticket asks to

adapt the parse function split and their names to what CPython uses in its own parser, as closely as possible (but keeping the p_ prefix for readability)
remove flag options where possible and reasonable
split parse functions that currently take options into separate functions that parse different things, and use them in the appropriate places.

Basically, it should be clear from the name of a called parse function in which state the parser now changes and what it is allowed to see next. This state should not depend on additional options (“parse X next, unless I’m telling you not to do what I’m asking you to do”).

This can (and should best) be done in multiple iterations, both to keep the changes easy to review and to allow us to see where we are going along the way.

Known fields that require a cleanup are

p_test() as entry point for expressions
the integration of lambda expressions
star expressions
conditional expressions
named expressions (walrus operator)

Along the way, the following missing syntax features can be added:

parenthesised context managers (https://docs.python.org/3/whatsnew/3.10.html#new-features)
general expressions as decorators (#4570, PEP 614)

The Python test suite has tests for them.

CPython’s old parser grammar: https://github.com/python/cpython/blob/3.9/Grammar/Grammar CPython’s new PEG parser grammar: https://github.com/python/cpython/blob/main/Grammar/python.gram

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:29 (18 by maintainers)

Top GitHub Comments

1reaction

scodercommented, Jun 27, 2022

Is there no tests specifially for the parser/syntax?

The parser rarely changes compared to all the rest and is mostly tested through the bulk of feature file tests, which, luckily for us, test most of the features that programmers use in their code. But since we’re dealing with a programming language here, specifically one that borrows from three different languages, it’s difficult to even get close to testing all syntactic combinations that are relevant for the parser. It’s not just syntax constructs, there’s often also context involved.

That said, many of the compile tests target mostly the parser.

1reaction

da-woodscommented, Jun 26, 2022

Why it’s important to link to the current scanner?

It isn’t hugely important. It’s just a bit of code that’s been working well without many changes for a long time

Top Results From Across the Web

Issue 36541: Make lib2to3 grammar better match Python ...

The grammar in lib2to3 is out of date and can't parse `:=` nor `f(**not x)` from running on real code. I've done a...

Show HN: Python Source Code Refactoring Toolkit via AST

Something simple which supports pattern matching against a CST subtree and invertible parsing/unparsing would be the ideal polyglot macro-system ...

Lightweight Multi-Language Syntax Transformation with ...

By contrast, our approach embeds the structural properties of syntax in the generated parsers alone; the parser records only matching syntax during parsing....

Parsing in Python: all the tools and libraries you can use

We present and compare all possible alternatives you can use to parse languages ... The most used format to describe grammars is the...

Deparsing-Paper.pdf - Rocky's space on github

We use a grammar-directed parsing of instructions with an ambiguous grammar to create a ... more closely from bytecode than when starting from...