question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Lark parse performance and parent / child indentation

See original GitHub issue

What is your question?

I have a text parser (ciscoconfparse) which I use to parse various configurations. ciscoconfparse parent / child line assignment happens based on whether the child candidate line is indented more than the parent is indented. See Example01 below.

All code in this github issue was parsed under Debian 10 and Python3.7:

  • Lark version 0.11.1
  • ciscoconfpasre version 1.5.22

If you’re having trouble with your code or grammar

I am interested in converting my script to use Lark, but I can’t stomach the Lark performance hit that I see in Example01. ciscoconfparse is 20x faster than Lark’s parse times (of the same text).

  • Is there a way for me to accelerate Lark’s parse time in Example01?
  • So far, I can’t find a way to extract indent_level for the Lark-parsed config. Can someone give me assistance with how I can quantify the indent-level of each Lark-parsed line in Example01?

As recommended in Lark’s docs, the NL Token parses the next-lines’ whitespace indent. I realize I could carry the indent_level over from the previous-line’s NL but I hope for a simpler option than that.

Explain what you’re trying to do, and what is obstructing your progress.

# Example01
# File: lark_test_indent.py
import time

from lark import Lark
from lark.indenter import Indenter

from ciscoconfparse import CiscoConfParse

from snoop import Config
config = Config(color=True)
snoop = config.snoop

class MyTreeIndenter(Indenter):
    tab_len = 1
    NL_type = 'NL'
    OPEN_PAREN_types = []
    CLOSE_PAREN_types = []
    INDENT_type = '_INDENT'
    DEDENT_type = '_DEDENT'

grammar = r"""
start        : (empty_line
             | instruction
             | comment)+

comment      : (_INDENT | _DEDENT)* COMMENT NL
instruction  : (_INDENT | _DEDENT)* INSTRUCTION NL
empty_line   : NL+

// Tokens are defined with uppercase names...
COMMENT       : /\![^\n]*/
INSTRUCTION   : /\w[^\n]*/
// NL *must* match leading whitespace on new lines...
NL            : /(\r?\n)+[\s\t]*/


%import common.WS_INLINE
%ignore WS_INLINE
%declare _INDENT _DEDENT
"""

text_01 = """
vehicle for Jim
 600 horsepower
 18 wheels
 red color
 2 doors
  chrome handles outside
  no pushbutton lock outside
"""

text_02 = """
vehicle for Jim
 600 horsepower
 18 wheels
 red color
 2 doors
  chrome handles outside
  pushbutton lock outside
vehicle for Charles
 320 horsepower
 4 wheels
 green color
 4 doors
 !
  chrome handles outside
  !no pushbutton lock outside
!
"""

#@snoop(depth=2)
def lark_parse(instructions=""):
    parser = Lark(grammar, parser='lalr', postlex=MyTreeIndenter())
    parse_tree = parser.parse(instructions)
    for inst_obj in parse_tree.children:
        print("INSTRUCTION", str(inst_obj))

def ccp(instructions=""):
    parse = CiscoConfParse(instructions.splitlines())
    for obj in parse.ConfigObjs:
        print(obj, "Comment:", obj.is_comment)

if __name__=='__main__':
    begin = time.time()
    lark_parse(text_02)
    end = round(time.time() - begin, 6)
    print("Lark parse time: %s" % end)

    print("---------------------------")

    begin = time.time()
    ccp(text_02)
    end = round(time.time() - begin, 6)
    print("CiscoConfParse parse time: %s" % end)

Example01 parse output:

$ python lark_test_indent.py
INSTRUCTION Tree('empty_line', [Token('NL', '\n')])
INSTRUCTION Tree('instruction', [Token('INSTRUCTION', 'vehicle for Jim'), Token('NL', '\n ')])
INSTRUCTION Tree('instruction', [Token('INSTRUCTION', '600 horsepower'), Token('NL', '\n ')])
INSTRUCTION Tree('instruction', [Token('INSTRUCTION', '18 wheels'), Token('NL', '\n ')])
INSTRUCTION Tree('instruction', [Token('INSTRUCTION', 'red color'), Token('NL', '\n ')])
INSTRUCTION Tree('instruction', [Token('INSTRUCTION', '2 doors'), Token('NL', '\n  ')])
INSTRUCTION Tree('instruction', [Token('INSTRUCTION', 'chrome handles outside'), Token('NL', '\n  ')])
INSTRUCTION Tree('instruction', [Token('INSTRUCTION', 'pushbutton lock outside'), Token('NL', '\n')])
INSTRUCTION Tree('instruction', [Token('INSTRUCTION', 'vehicle for Charles'), Token('NL', '\n ')])
INSTRUCTION Tree('instruction', [Token('INSTRUCTION', '320 horsepower'), Token('NL', '\n ')])
INSTRUCTION Tree('instruction', [Token('INSTRUCTION', '4 wheels'), Token('NL', '\n ')])
INSTRUCTION Tree('instruction', [Token('INSTRUCTION', 'green color'), Token('NL', '\n ')])
INSTRUCTION Tree('instruction', [Token('INSTRUCTION', '4 doors'), Token('NL', '\n ')])
INSTRUCTION Tree('comment', [Token('COMMENT', '!'), Token('NL', '\n  ')])
INSTRUCTION Tree('instruction', [Token('INSTRUCTION', 'chrome handles outside'), Token('NL', '\n  ')])
INSTRUCTION Tree('comment', [Token('COMMENT', '!no pushbutton lock outside'), Token('NL', '\n')])
INSTRUCTION Tree('comment', [Token('COMMENT', '!'), Token('NL', '\n')])
Lark parse time: 0.037327
---------------------------
<IOSCfgLine # 0 'vehicle for Jim'> Comment: False
<IOSCfgLine # 1 ' 600 horsepower' (parent is # 0)> Comment: False
<IOSCfgLine # 2 ' 18 wheels' (parent is # 0)> Comment: False
<IOSCfgLine # 3 ' red color' (parent is # 0)> Comment: False
<IOSCfgLine # 4 ' 2 doors' (parent is # 0)> Comment: False
<IOSCfgLine # 5 '  chrome handles outside' (parent is # 4)> Comment: False
<IOSCfgLine # 6 '  pushbutton lock outside' (parent is # 4)> Comment: False
<IOSCfgLine # 7 'vehicle for Charles'> Comment: False
<IOSCfgLine # 8 ' 320 horsepower' (parent is # 7)> Comment: False
<IOSCfgLine # 9 ' 4 wheels' (parent is # 7)> Comment: False
<IOSCfgLine # 10 ' green color' (parent is # 7)> Comment: False
<IOSCfgLine # 11 ' 4 doors' (parent is # 7)> Comment: False
<IOSCfgLine # 12 ' !' (parent is # 7)> Comment: True
<IOSCfgLine # 13 '  chrome handles outside' (parent is # 11)> Comment: False
<IOSCfgLine # 14 '  !no pushbutton lock outside' (parent is # 11)> Comment: True
<IOSCfgLine # 15 '!'> Comment: True
CiscoConfParse parse time: 0.001478
$

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
MegaIngcommented, Dec 27, 2020

Do you any suggestions for how I can properly Tokenize _INDENT and _DEDENT for whitespace-parsing in Example01?

That depends on what your goal is. Your current grammar is probably not finished. You should first make the grammar work correctly and then make it faster.

0reactions
mpenningcommented, Dec 28, 2020

I don’t get it. Is this an issue with Lark?

I answered your question, which said nothing about parsing with Lark.

At this point, I do not have a complete Lark parser. Given the performance gap between Lark and ciscoconfparse’s original parser, I’m still consideridering the cost / benefit analysis of parsing with Lark.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Lark Documentation
LALR(1) parser, limited in power of expression, but very efficient in space and performance (O(n)). – Implements a parse-aware lexer that ...
Read more >
Ubuntu Manpage: lark - Lark Documentation
LALR(1) parser, limited in power of expression, but very efficient in space and performance (O(n)). • Implements a parse-aware lexer that provides a...
Read more >
Python Lark parser: no versions I've installed seem to have the ...
The JSON parser in the Lark examples directory uses a tree ... use the builtin JSON.dumps function with a non-zero indent keyword argument....
Read more >
lark(7) — python3-lark — Debian testing
A demonstration of parsing indentation (“whitespace significant” language) and ... import Path examples_path = Path(__file__).parent lark_path = Path(lark.
Read more >
How to build your own language - CERN Indico
Lark can: Parse all context-free grammars, and handle any ambiguity. Build a parse-tree automagically, no construction code required.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found