Generate parser as Python C-Extension
See original GitHub issueSuggestion
Lark really makes creating parsers easy, but unfortunately generated parser is very slow. I recently had to debug why things are so slow and found that most of the time is spent on parsing. Hardcoding simple and frequently used cases like _id="UUID"
helped to improve performance 10 times. I’m talking not about 10 times better parsing, but overall app performance.
So maybe adding possibility to generate parser as a C-Extension, which would have same interface, would really improve performance? Having Python based parser is nice for prototyping, but eventually it will have to be ported to a lower level language in order to increase performance.
Describe alternatives you’ve considered Probably updating grammar from Earley to LALR would increase performance, but still having C parser will be a lot faster.
I was looking at pegen, but I’m not sure if it is intended for anything else other than Python language itself.
Other options probably to port grammar from Lark to bison and flex. But that sounds too complicated.
Additional context I’m working on data management project, where I use a small expression language for data transformation. Parser I use can be found here:
https://gitlab.com/atviriduomenys/spinta/-/blob/master/spinta/spyna.py
The man performance issues where in an upsert action, where I upserting a lot of data and each upsert action has a "_where": "_id='UUID'"
expression which is parsed with Lark, and this took most of the time.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:6 (5 by maintainers)
Top GitHub Comments
I have successfully transpiled lark (via nuitka) into C but the
common.lark
and other metadata files had some trouble catching on the transpiliation.Yes, definitely. Actually, it’s possible that LALR in Python would be faster than Earley in C.
Looking at your grammar, it looks like it’s simple enough, that it should be possible to use LALR.