How to use all special characters and store them as is in the grammar rules of the parser?
See original GitHub issueDescribe the bug
Not 100% if its a bug but I was trying to match a rule with the given rules but I noticed that one of the rules in the parser has a strange anon symbol instead of my symbol:
rule
Out[9]: Rule(NonTerminal('type'), [NonTerminal('type'), Terminal('→'), NonTerminal('type')], None, RuleOptions(False, False, None, None))
rules
Out[10]:
[Rule(NonTerminal('type'), [NonTerminal('type'), Terminal('__ANON_0'), NonTerminal('type')], None, RuleOptions(True, False, None, None)),
Rule(NonTerminal('type'), [Terminal('T')], None, RuleOptions(True, False, None, None))]
which makes me unable to find a rule. Is this expected? How to have parser detect that type of character and store it properly?
To Reproduce
Use this grammar to reproduce:
from lark import Lark
grammar = r"
type : type "→" type // "→"
| /[A-Za-z0-9_]+/
// ignore tokens
IGNORE_TOKENS : "(" | ")"
%import common.WS
%ignore WS
%ignore IGNORE_TOKENS"
"
parser = Lark(grammar, 'type', keep_all_tokens=True)
ast = parser.parse('T → T')
rule = Rule(NonTerminal('type'), [NonTerminal('type'), Terminal('→'), NonTerminal('type')], None, RuleOptions(False, False, None, None))
parser.rules.index(rule)
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (7 by maintainers)
Top Results From Across the Web
You are not logged in. Reading 18: Parsers
Objectives. After today's class, you should: Be able to use a grammar in combination with a parser generator, to parse a character sequence...
Read more >How can I parse a special character differently in two terminal ...
I have a grammar that uses the $ character at the start of many terminal rules, such as $video{ , $audio{ , $image{...
Read more >Reading grammar rules and creating parsers - YouTube
The third video in the formal programming language series.In this video we talk about grammar rules, how to read them, how to follow...
Read more >A Guide To Parsing: Algorithms And Terminology
An in-depth coverage of parsing terminology an issues, together with an explanation for each one of the major algorithms and when to use...
Read more >Parsing Expressions - Crafting Interpreters
It means to take a text and map each word to the grammar of the language. ... right into the grammar rules, some...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I looked into it. We depend on terminal names for use as regex groups. So anything that isn’t a valid ID would fail. Of course, we can translate between the “safe name” and the “value name” before creating the token, but my feeling is that it’s not worth the trouble.
Btw, the relevant information is available in
Lark.terminals
, so you can do the translation yourself if you want, before you begin working with the tree.It intoduces unicode to places in the codebase that didn’t have it till now, that might make problems. But we since we are dropping 2.7 support anyway right now, we dont have to care.