Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Lexer callbacks not called under Earley parser

See original GitHub issue

The following code:

#! /usr/bin/env python3

from lark import Lark


GRAMMAR = """
    start:
    EKS: /x/
    %ignore EKS
"""

if __name__ == '__main__':
    Lark(GRAMMAR, parser='earley', lexer_callbacks={'EKS': print}).parse('x')

prints nothing, unlike its LALR equivalent:

#! /usr/bin/env python3

from lark import Lark


GRAMMAR = """
    start:
    EKS: /x/
    %ignore EKS
"""

if __name__ == '__main__':
    Lark(GRAMMAR, parser='lalr', lexer_callbacks={'EKS': print}).parse('x')

which prints the expected output, x.

It appears that under the Earley parser the lexer callbacks are not invoked, which in my particular case means that I lose the ability to collect comments when I switch to Earley.

Issue Analytics

State:
Created 4 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

erezshcommented, Jul 19, 2021

To the best of my knowledge, the Earley lexer currently doesn’t allow access to ignored tokens. The difficulty is that because they are not part of the tree, and there might be many possible trees (due to ambiguity), there is no way to know, at the time of lexing, which tokens are correct and which aren’t.

However, it might be possible to allow the comments to be saved on the tree itself, so they can be collected when the parse is over.

If I had to guess, I would say this would be the key point to patch: https://github.com/lark-parser/lark/blob/master/lark/parsers/xearley.py#L75

The main challenge would be to decide which tree node to attach the comments to, and to ensure that they are attached correctly.

0reactions

pileoncommented, Jul 19, 2021

@erezsh I have a similar problem to the OP, in that I want to be able to catch comments.

The problem is that the grammar doesn’t work with the lalr parser or the standard lexer.

Are there other ways to catch (and ignore in the parser!) terminals using the earley parser and its default lexer?

Top Results From Across the Web

lark-parser/Lobby - Gitter

Hi I'm wondering why this lexer_callbacks are not working: grammar= r""" start : statement* statement : expression ";" expression : ID "=" expression ......

API Reference - Lark documentation

“contextual”: Stronger lexer (only works with parser=”lalr”) ... Create an instance of Lark with the grammar loaded from within the package package ....

Ubuntu Manpage: lark - Lark Documentation

Lark's Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one-by-one....

Name for this type of parser, OR why it doesn't exist

A parser that returns a (partial) result before the whole input has been consumed is called an incremental parser. Incremental parsing can ...

lark.js - Documentation

Note: The parser doesn't hold a copy of the text it has to parse, ... __default__(tree); } } /** Default function that is...