question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LARL parser issue with multiple rules that produce nothing

See original GitHub issue

I believe the following example should parse:

from lark import Lark
p = Lark("""
    a: b c d
    b: "B"
    c: | "C"
    d: | "D"
""", start='a', parser='lalr')
print(p.parse('B'))

I would expect it to produce Tree(a, [Tree(b, [Token(B, 'B')]), Tree(c, []), Tree(d, [])]), but instead it fails with

UnexpectedToken: Unexpected token Token(B, 'B') at line 1, column 1.
Expected: C, D

I think the generated parse table is missing reduce b for $END in state 6 - it does so for lookahead C or D, but not for $END. Curiously, it works OK if instead of epsilon producing rules for c and d I use a: b c? d?, but this is much less convenient for my usage as I will have to inspect what types are the optional arguments to the transformer method a to find out what they mean instead of just having them in the correspondingly named argument.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
erezshcommented, Oct 15, 2018

lol that’s probably wise. If you feel like taking a crack at some of the open issues that would be great, although admittedly most of them are boring usability issues, rather than cool algorithmic bugs. (Except for the ones about Earley, but they are already on the way to be fixed)

1reaction
petee-dcommented, Oct 15, 2018

Wow, @erezsh, such time of response is simply amazing! Doing a fix of such a deeply hidden bug from issue to master within an hour is quite something. 😃 I’m very happy to say my real grammar where I discovered this (100+ lines) now works perfectly. I really love the project, the best parsing library I have seen so far, and I’ve previously been switching them like socks in my project.

I’ve heard there are exactly 2 hard problems in computer science: cache invalidation, naming things, and off-by-one errors.

BTW, would you like me to create another issue for the error reporting problem I mentioned in my last comment? It might take some creativity to fix it in a backwards compatible way, probably not something to bundle with another fix.

Read more comments on GitHub >

github_iconTop Results From Across the Web

LALR Parser (with Examples) - GeeksforGeeks
LALR Parser : LALR Parser is lookahead LR parser. It is the most powerful parser which can handle large classes of grammar.
Read more >
14: LALR Parsing - CS106X Handout #01
Because a canonical LR(1) parser splits states based on differing lookahead sets, it can have many more states than the corresponding SLR(1) or...
Read more >
How does the yacc/bison LALR(1) algorithm treat "empty" rules?
My parser can't handle them, since in generating the table it looks at each symbol in each rule, recursively, and "empty" is just...
Read more >
How to Implement an LR(1) Parser - Serokell
If two of your conflicting rules are able to parse the same input, ... The problem with LR is that it appears complex...
Read more >
LR parser - Wikipedia
LR parsers are deterministic; they produce a single correct parse without guesswork or backtracking, in linear time. This is ideal for computer languages,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found