Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use all special characters and store them as is in the grammar rules of the parser?

See original GitHub issue

Describe the bug

Not 100% if its a bug but I was trying to match a rule with the given rules but I noticed that one of the rules in the parser has a strange anon symbol instead of my symbol:

rule
Out[9]: Rule(NonTerminal('type'), [NonTerminal('type'), Terminal('→'), NonTerminal('type')], None, RuleOptions(False, False, None, None))
rules
Out[10]: 
[Rule(NonTerminal('type'), [NonTerminal('type'), Terminal('__ANON_0'), NonTerminal('type')], None, RuleOptions(True, False, None, None)),
 Rule(NonTerminal('type'), [Terminal('T')], None, RuleOptions(True, False, None, None))]

which makes me unable to find a rule. Is this expected? How to have parser detect that type of character and store it properly?

To Reproduce

Use this grammar to reproduce:

from lark import Lark
grammar = r"
type : type "→" type  // "→"
    | /[A-Za-z0-9_]+/

// ignore tokens
IGNORE_TOKENS : "(" | ")"
%import common.WS

%ignore WS
%ignore IGNORE_TOKENS"
"
parser = Lark(grammar, 'type', keep_all_tokens=True)
ast = parser.parse('T → T')
rule = Rule(NonTerminal('type'), [NonTerminal('type'), Terminal('→'), NonTerminal('type')], None, RuleOptions(False, False, None, None))
parser.rules.index(rule)

Issue Analytics

State:
Created 2 years ago
Comments:11 (7 by maintainers)

Top GitHub Comments

1reaction

erezshcommented, Jul 22, 2021

I looked into it. We depend on terminal names for use as regex groups. So anything that isn’t a valid ID would fail. Of course, we can translate between the “safe name” and the “value name” before creating the token, but my feeling is that it’s not worth the trouble.

Btw, the relevant information is available in Lark.terminals, so you can do the translation yourself if you want, before you begin working with the tree.

0reactions

MegaIngcommented, Jul 10, 2021

Why would it matter on 2.7?

It intoduces unicode to places in the codebase that didn’t have it till now, that might make problems. But we since we are dropping 2.7 support anyway right now, we dont have to care.