Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parse-time specification of start non-terminal

See original GitHub issue

I’m working on a grammar for a query language that has several entities that can recursively use each other. As an example, imagine SQL and entities like an entire subquery, a projection item, a constraint in the WHERE clause - any of those entities can contain the other entities inside. The parser needs to be able to parse a string given what kind of entity it is, i.e. I ultimately need a method that parses any valid select query, another method that parses any valid projection item, another one that parses a constraint and so on - I think I’ll need at least 7 such entities.

I currently see two options on how to do that.

Create mutliple parsers, each time specifying the same grammar but different start non-terminal. This has the disadvantage of compiling the grammar and creating the parser each time, which is a problem considering just one parser takes quarter of a second to build and with 7 the startup time increases too much.
Create a grammar with a special starting non-terminal that determines the type of entity that the parsed string should represent based on some short prefix, like here (just an illustration, I didn’t even check this is a valid grammar):

start: "!select" select
     | "!projection" projection
     | "!where" where
// bellow is what the actual grammar might look like
select: "select" projection+ "where" where
projection: "column" | select
where: "constraint" | "exists(" select ")"

The above is most likely what I’ll end up doing, but it’s ugly and I will have to account for this artificially added prefix in determining the position of parse errors in user supplied inputs.

A better solution I could imagine is the ability to supply the start non-terminal when calling the parse method. I don’t know much of the theory behind the other parser frontends, but for LALR this should be possible just by initializing the parser state stack with a state determined by mapping the name of the start non-terminal supplied by the caller.

This is probably too big of a change for me to attempt to do myself any time soon. Is this something that you would like lark to have one day? Do you see any better workaround, or is the input prefix hack the best one can do?

Issue Analytics

State:
Created 4 years ago
Comments:10 (5 by maintainers)

Top GitHub Comments

2reactions

erezshcommented, Jul 2, 2019

Thanks, @petee-d, I’m happy you had such a good experience with Lark! And I’m flattered to receive such compliments. This feature request just had the right balance of being small and simple enough, but also a little tricky, which is my favorite combination. Also, I think it’s an improvement on Lark’s API.

If you do end up talking about Lark, let me know if you have any questions for me. And if possible, please send me the link afterwards, so I can brag about it 😃

2reactions

bilderbuchicommented, Jul 1, 2019

This feature sounds useful also for, e.g., testing the correct operation of atransformer for certain sub-parts of the grammar. Currently, I create a separate Lark instance for each test which, although relatively easy to implement with pytest fixtures, is quite wasteful/slow.

Maybe a better API would be

parser = Lark(..., start='rule2', start_choices=['rule1', 'rule2', 'rule3'])
parser.parse('text')  # will use rule2
parser.parse('text', start='rule1')  # will dynamically switch to start with rule1

to ensure this feature remains optional and to not complicate parse calls for the “default” usage?

Top Results From Across the Web

Nonterminal - 1.64.0 - Boost C++ Libraries

A Nonterminal is a symbol in a Parsing Expression Grammar production that ... Evaluate expr at parse time and pass the result to...

Table of Symbols (Bison 3.8.1) - GNU.org

The predefined nonterminal whose only rule is ' $accept: start $end ', where start is the start symbol. See The Start-Symbol.

Lecture 2: Data Types and their Representations; Syntax

The left-hand side of the first production is the start symbol for the grammar. Each production consists of a left-hand side (a non-terminal...

Nonterminal

A Nonterminal is a symbol in a Parsing Expression Grammar production that represents a grammar ... The specification uses the function declarator syntax:...

Zyacc Grammar Files

Zyacc takes as input a context-free grammar specification and produces a ... The Zyacc parser itself contains many static variables whose names start...