question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use all special characters and store them as is in the grammar rules of the parser?

See original GitHub issue

Describe the bug

Not 100% if its a bug but I was trying to match a rule with the given rules but I noticed that one of the rules in the parser has a strange anon symbol instead of my symbol:

rule
Out[9]: Rule(NonTerminal('type'), [NonTerminal('type'), Terminal('→'), NonTerminal('type')], None, RuleOptions(False, False, None, None))
rules
Out[10]: 
[Rule(NonTerminal('type'), [NonTerminal('type'), Terminal('__ANON_0'), NonTerminal('type')], None, RuleOptions(True, False, None, None)),
 Rule(NonTerminal('type'), [Terminal('T')], None, RuleOptions(True, False, None, None))]

which makes me unable to find a rule. Is this expected? How to have parser detect that type of character and store it properly?

To Reproduce

Use this grammar to reproduce:

from lark import Lark
grammar = r"
type : type "→" type  // "→"
    | /[A-Za-z0-9_]+/

// ignore tokens
IGNORE_TOKENS : "(" | ")"
%import common.WS

%ignore WS
%ignore IGNORE_TOKENS"
"
parser = Lark(grammar, 'type', keep_all_tokens=True)
ast = parser.parse('T → T')
rule = Rule(NonTerminal('type'), [NonTerminal('type'), Terminal('→'), NonTerminal('type')], None, RuleOptions(False, False, None, None))
parser.rules.index(rule)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
erezshcommented, Jul 22, 2021

I looked into it. We depend on terminal names for use as regex groups. So anything that isn’t a valid ID would fail. Of course, we can translate between the “safe name” and the “value name” before creating the token, but my feeling is that it’s not worth the trouble.

Btw, the relevant information is available in Lark.terminals, so you can do the translation yourself if you want, before you begin working with the tree.

0reactions
MegaIngcommented, Jul 10, 2021

Why would it matter on 2.7?

It intoduces unicode to places in the codebase that didn’t have it till now, that might make problems. But we since we are dropping 2.7 support anyway right now, we dont have to care.

Read more comments on GitHub >

github_iconTop Results From Across the Web

You are not logged in. Reading 18: Parsers
Objectives. After today's class, you should: Be able to use a grammar in combination with a parser generator, to parse a character sequence...
Read more >
How can I parse a special character differently in two terminal ...
I have a grammar that uses the $ character at the start of many terminal rules, such as $video{ , $audio{ , $image{...
Read more >
Reading grammar rules and creating parsers - YouTube
The third video in the formal programming language series.In this video we talk about grammar rules, how to read them, how to follow...
Read more >
A Guide To Parsing: Algorithms And Terminology
An in-depth coverage of parsing terminology an issues, together with an explanation for each one of the major algorithms and when to use...
Read more >
Parsing Expressions - Crafting Interpreters
It means to take a text and map each word to the grammar of the language. ... right into the grammar rules, some...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found