question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parse-time specification of start non-terminal

See original GitHub issue

I’m working on a grammar for a query language that has several entities that can recursively use each other. As an example, imagine SQL and entities like an entire subquery, a projection item, a constraint in the WHERE clause - any of those entities can contain the other entities inside. The parser needs to be able to parse a string given what kind of entity it is, i.e. I ultimately need a method that parses any valid select query, another method that parses any valid projection item, another one that parses a constraint and so on - I think I’ll need at least 7 such entities.

I currently see two options on how to do that.

  • Create mutliple parsers, each time specifying the same grammar but different start non-terminal. This has the disadvantage of compiling the grammar and creating the parser each time, which is a problem considering just one parser takes quarter of a second to build and with 7 the startup time increases too much.
  • Create a grammar with a special starting non-terminal that determines the type of entity that the parsed string should represent based on some short prefix, like here (just an illustration, I didn’t even check this is a valid grammar):
start: "!select" select
     | "!projection" projection
     | "!where" where
// bellow is what the actual grammar might look like
select: "select" projection+ "where" where
projection: "column" | select
where: "constraint" | "exists(" select ")" 

The above is most likely what I’ll end up doing, but it’s ugly and I will have to account for this artificially added prefix in determining the position of parse errors in user supplied inputs.

A better solution I could imagine is the ability to supply the start non-terminal when calling the parse method. I don’t know much of the theory behind the other parser frontends, but for LALR this should be possible just by initializing the parser state stack with a state determined by mapping the name of the start non-terminal supplied by the caller.

This is probably too big of a change for me to attempt to do myself any time soon. Is this something that you would like lark to have one day? Do you see any better workaround, or is the input prefix hack the best one can do?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
erezshcommented, Jul 2, 2019

Thanks, @petee-d, I’m happy you had such a good experience with Lark! And I’m flattered to receive such compliments. This feature request just had the right balance of being small and simple enough, but also a little tricky, which is my favorite combination. Also, I think it’s an improvement on Lark’s API.

If you do end up talking about Lark, let me know if you have any questions for me. And if possible, please send me the link afterwards, so I can brag about it 😃

2reactions
bilderbuchicommented, Jul 1, 2019

This feature sounds useful also for, e.g., testing the correct operation of atransformer for certain sub-parts of the grammar. Currently, I create a separate Lark instance for each test which, although relatively easy to implement with pytest fixtures, is quite wasteful/slow.

Maybe a better API would be

parser = Lark(..., start='rule2', start_choices=['rule1', 'rule2', 'rule3'])
parser.parse('text')  # will use rule2
parser.parse('text', start='rule1')  # will dynamically switch to start with rule1

to ensure this feature remains optional and to not complicate parse calls for the “default” usage?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Nonterminal - 1.64.0 - Boost C++ Libraries
A Nonterminal is a symbol in a Parsing Expression Grammar production that ... Evaluate expr at parse time and pass the result to...
Read more >
Table of Symbols (Bison 3.8.1) - GNU.org
The predefined nonterminal whose only rule is ' $accept: start $end ', where start is the start symbol. See The Start-Symbol.
Read more >
Lecture 2: Data Types and their Representations; Syntax
The left-hand side of the first production is the start symbol for the grammar. Each production consists of a left-hand side (a non-terminal...
Read more >
Nonterminal
A Nonterminal is a symbol in a Parsing Expression Grammar production that represents a grammar ... The specification uses the function declarator syntax:...
Read more >
Zyacc Grammar Files
Zyacc takes as input a context-free grammar specification and produces a ... The Zyacc parser itself contains many static variables whose names start...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found