question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Refactor parser structure to match CPython's grammar more closely

See original GitHub issue

The parse function structure of the parser implementation in Cython/Compiler/Parsing.py has diverged from the old Grammar in CPython and certainly does not match the new PEG parser. Additionally, several flags were added over time that make it less clear what kind of expression is allowed and supposed to be parsed where. This ticket asks to

  • adapt the parse function split and their names to what CPython uses in its own parser, as closely as possible (but keeping the p_ prefix for readability)
  • remove flag options where possible and reasonable
  • split parse functions that currently take options into separate functions that parse different things, and use them in the appropriate places.

Basically, it should be clear from the name of a called parse function in which state the parser now changes and what it is allowed to see next. This state should not depend on additional options (“parse X next, unless I’m telling you not to do what I’m asking you to do”).

This can (and should best) be done in multiple iterations, both to keep the changes easy to review and to allow us to see where we are going along the way.

Known fields that require a cleanup are

  • p_test() as entry point for expressions
  • the integration of lambda expressions
  • star expressions
  • conditional expressions
  • named expressions (walrus operator)

Along the way, the following missing syntax features can be added:

The Python test suite has tests for them.

CPython’s old parser grammar: https://github.com/python/cpython/blob/3.9/Grammar/Grammar CPython’s new PEG parser grammar: https://github.com/python/cpython/blob/main/Grammar/python.gram

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:29 (18 by maintainers)

github_iconTop GitHub Comments

1reaction
scodercommented, Jun 27, 2022

Is there no tests specifially for the parser/syntax?

The parser rarely changes compared to all the rest and is mostly tested through the bulk of feature file tests, which, luckily for us, test most of the features that programmers use in their code. But since we’re dealing with a programming language here, specifically one that borrows from three different languages, it’s difficult to even get close to testing all syntactic combinations that are relevant for the parser. It’s not just syntax constructs, there’s often also context involved.

That said, many of the compile tests target mostly the parser.

1reaction
da-woodscommented, Jun 26, 2022

Why it’s important to link to the current scanner?

It isn’t hugely important. It’s just a bit of code that’s been working well without many changes for a long time

Read more comments on GitHub >

github_iconTop Results From Across the Web

Issue 36541: Make lib2to3 grammar better match Python ...
The grammar in lib2to3 is out of date and can't parse `:=` nor `f(**not x)` from running on real code. I've done a...
Read more >
Show HN: Python Source Code Refactoring Toolkit via AST
Something simple which supports pattern matching against a CST subtree and invertible parsing/unparsing would be the ideal polyglot macro-system ...
Read more >
Lightweight Multi-Language Syntax Transformation with ...
By contrast, our approach embeds the structural properties of syntax in the generated parsers alone; the parser records only matching syntax during parsing....
Read more >
Parsing in Python: all the tools and libraries you can use
We present and compare all possible alternatives you can use to parse languages ... The most used format to describe grammars is the...
Read more >
Deparsing-Paper.pdf - Rocky's space on github
We use a grammar-directed parsing of instructions with an ambiguous grammar to create a ... more closely from bytecode than when starting from...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found