Lexer callbacks not called under Earley parser
See original GitHub issueThe following code:
#! /usr/bin/env python3
from lark import Lark
GRAMMAR = """
start:
EKS: /x/
%ignore EKS
"""
if __name__ == '__main__':
Lark(GRAMMAR, parser='earley', lexer_callbacks={'EKS': print}).parse('x')
prints nothing, unlike its LALR equivalent:
#! /usr/bin/env python3
from lark import Lark
GRAMMAR = """
start:
EKS: /x/
%ignore EKS
"""
if __name__ == '__main__':
Lark(GRAMMAR, parser='lalr', lexer_callbacks={'EKS': print}).parse('x')
which prints the expected output, x
.
It appears that under the Earley parser the lexer callbacks are not invoked, which in my particular case means that I lose the ability to collect comments when I switch to Earley.
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
lark-parser/Lobby - Gitter
Hi I'm wondering why this lexer_callbacks are not working: grammar= r""" start : statement* statement : expression ";" expression : ID "=" expression ......
Read more >API Reference - Lark documentation
“contextual”: Stronger lexer (only works with parser=”lalr”) ... Create an instance of Lark with the grammar loaded from within the package package ....
Read more >Ubuntu Manpage: lark - Lark Documentation
Lark's Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one-by-one....
Read more >Name for this type of parser, OR why it doesn't exist
A parser that returns a (partial) result before the whole input has been consumed is called an incremental parser. Incremental parsing can ...
Read more >lark.js - Documentation
Note: The parser doesn't hold a copy of the text it has to parse, ... __default__(tree); } } /** Default function that is...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
To the best of my knowledge, the Earley lexer currently doesn’t allow access to ignored tokens. The difficulty is that because they are not part of the tree, and there might be many possible trees (due to ambiguity), there is no way to know, at the time of lexing, which tokens are correct and which aren’t.
However, it might be possible to allow the comments to be saved on the tree itself, so they can be collected when the parse is over.
If I had to guess, I would say this would be the key point to patch: https://github.com/lark-parser/lark/blob/master/lark/parsers/xearley.py#L75
The main challenge would be to decide which tree node to attach the comments to, and to ensure that they are attached correctly.
@erezsh I have a similar problem to the OP, in that I want to be able to catch comments.
The problem is that the grammar doesn’t work with the
lalr
parser or thestandard
lexer.Are there other ways to catch (and ignore in the parser!) terminals using the
earley
parser and its default lexer?