Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Generate parser as Python C-Extension

See original GitHub issue

Suggestion Lark really makes creating parsers easy, but unfortunately generated parser is very slow. I recently had to debug why things are so slow and found that most of the time is spent on parsing. Hardcoding simple and frequently used cases like _id="UUID" helped to improve performance 10 times. I’m talking not about 10 times better parsing, but overall app performance.

So maybe adding possibility to generate parser as a C-Extension, which would have same interface, would really improve performance? Having Python based parser is nice for prototyping, but eventually it will have to be ported to a lower level language in order to increase performance.

Describe alternatives you’ve considered Probably updating grammar from Earley to LALR would increase performance, but still having C parser will be a lot faster.

I was looking at pegen, but I’m not sure if it is intended for anything else other than Python language itself.

Other options probably to port grammar from Lark to bison and flex. But that sounds too complicated.

Additional context I’m working on data management project, where I use a small expression language for data transformation. Parser I use can be found here:

https://gitlab.com/atviriduomenys/spinta/-/blob/master/spinta/spyna.py

The man performance issues where in an upsert action, where I upserting a lot of data and each upsert action has a "_where": "_id='UUID'" expression which is parsed with Lark, and this took most of the time.

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:6 (5 by maintainers)

Top GitHub Comments

2reactions

ThatXlinercommented, Nov 12, 2020

I have successfully transpiled lark (via nuitka) into C but the common.lark and other metadata files had some trouble catching on the transpiliation.

2reactions

erezshcommented, Oct 29, 2020

Probably updating grammar from Earley to LALR would increase performance

Yes, definitely. Actually, it’s possible that LALR in Python would be faster than Earley in C.

Looking at your grammar, it looks like it’s simple enough, that it should be possible to use LALR.

Top Results From Across the Web

Towards a Standard Parser Generator - Python.org

The C code is compiled to form an extension module. Recently, this build procedure was completely restructured. Today, BisonGen implements the LALR(1) algorithm ......

Building a Python C Extension Module

You'll learn how to: Invoke C functions from within Python; Pass arguments from Python to C and parse them accordingly; Raise exceptions from...

5. Parsing Python Arguments

My advice: Always make all PyObject* references to default arguments static . So first we declare a static PyObject* for each default argument:...

Parsing in Python: all the tools and libraries you can use

Python libraries to build parsers. Tools that can be used to generate the code for a parser are called parser generators or compiler...

Fancy Argument Parsing — c-extension-tutorial documentation

Python functions can accept multiple arguments, have default value, ... over how we parse arguments, however, this would make it very tedious to...