question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Lexer callbacks not called under Earley parser

See original GitHub issue

The following code:

#! /usr/bin/env python3

from lark import Lark


GRAMMAR = """
    start:
    EKS: /x/
    %ignore EKS
"""

if __name__ == '__main__':
    Lark(GRAMMAR, parser='earley', lexer_callbacks={'EKS': print}).parse('x')

prints nothing, unlike its LALR equivalent:

#! /usr/bin/env python3

from lark import Lark


GRAMMAR = """
    start:
    EKS: /x/
    %ignore EKS
"""

if __name__ == '__main__':
    Lark(GRAMMAR, parser='lalr', lexer_callbacks={'EKS': print}).parse('x')

which prints the expected output, x.

It appears that under the Earley parser the lexer callbacks are not invoked, which in my particular case means that I lose the ability to collect comments when I switch to Earley.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
erezshcommented, Jul 19, 2021

To the best of my knowledge, the Earley lexer currently doesn’t allow access to ignored tokens. The difficulty is that because they are not part of the tree, and there might be many possible trees (due to ambiguity), there is no way to know, at the time of lexing, which tokens are correct and which aren’t.

However, it might be possible to allow the comments to be saved on the tree itself, so they can be collected when the parse is over.

If I had to guess, I would say this would be the key point to patch: https://github.com/lark-parser/lark/blob/master/lark/parsers/xearley.py#L75

The main challenge would be to decide which tree node to attach the comments to, and to ensure that they are attached correctly.

0reactions
pileoncommented, Jul 19, 2021

@erezsh I have a similar problem to the OP, in that I want to be able to catch comments.

The problem is that the grammar doesn’t work with the lalr parser or the standard lexer.

Are there other ways to catch (and ignore in the parser!) terminals using the earley parser and its default lexer?

Read more comments on GitHub >

github_iconTop Results From Across the Web

lark-parser/Lobby - Gitter
Hi I'm wondering why this lexer_callbacks are not working: grammar= r""" start : statement* statement : expression ";" expression : ID "=" expression ......
Read more >
API Reference - Lark documentation
“contextual”: Stronger lexer (only works with parser=”lalr”) ... Create an instance of Lark with the grammar loaded from within the package package ....
Read more >
Ubuntu Manpage: lark - Lark Documentation
Lark's Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one-by-one....
Read more >
Name for this type of parser, OR why it doesn't exist
A parser that returns a (partial) result before the whole input has been consumed is called an incremental parser. Incremental parsing can ...
Read more >
lark.js - Documentation
Note: The parser doesn't hold a copy of the text it has to parse, ... __default__(tree); } } /** Default function that is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found