grammar files get opened an unnecessary amount of times, causing an enormous loading time when creating a parser
See original GitHub issueit seems that, when using a FromPackageLoader
object, a grammar file is opened and read from each time another grammar uses a rule that is imported from that former grammar. this means opening the same file over and over again, for each occurence of a rule contained in that file.
while this may not be noticeable for parsers that only use grammar files contained in the same directory (meaning no custom FromPackageLoader
is necessary), it becomes highly problematic when using many FromPackageLoader
s, as the time required to construct a parser goes up by an absurd amount.
by placing a print(resource_name)
in the get_data()
function of the python lib pkgutil.py
, i was able to count how many times my grammar files were loaded each. for example, the common.lark
grammar provided by lark gets opened 61 (!) times, one of my own grammars 25 times, another 16, etc.
Issue Analytics
- State:
- Created 2 years ago
- Comments:72 (72 by maintainers)
Top GitHub Comments
@ornariece I will create a PR, probably tomorrow. Now I gotta sleep. 😃
@MegaIng We can deduplicate rules in other, maybe better ways. For example, nested grammars can stand on their own, so that importing
a.lark
from two different grammars will result in the same rules and terminals (and the namespace will be justa
, not nested)