question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Grammar that parses in milliseconds in Java runtime, takes many seconds in python

See original GitHub issue

I have a grammar of Python3 (from the antlr4 grammar repository), extended with query language constructs. The grammar file is here: Grammar file

The whole project is here: PythonQL

This tiny program parses in milliseconds with the Java runtime, but takes about 1.5 seconds in python (after the recent fix, before it was over 2 seconds).

# This example illustrates the window query in PythonQL

from collections import namedtuple
trade = namedtuple('Trade', ['day','ammount', 'stock_id'])

trades = [ trade(1, 15.34, 'APPL'),
           trade(2, 13.45, 'APPL'),
           trade(3, 8.34,  'APPL'),
           trade(4, 9.87,  'APPL'),
           trade(5, 10.99, 'APPL'),
           trade(6, 76.16, 'APPL') ]

# Maximum 3-day sum

res = (select win
        for sliding window win in ( select t.ammount for t in trades )
        start at s when True
        only end at e when (e-s == 2))

print (res)

Here is a profiler trace just in case (I left only the relevant entries):

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    21    0.000    0.000    0.094    0.004 PythonQLParser.py:7483(argument)
    8    0.000    0.000    0.195    0.024 PythonQLParser.py:7379(arglist)  
    9    0.000    0.000    0.196    0.022 PythonQLParser.py:6836(trailer)  
    5/3    0.000    0.000    0.132   0.044 PythonQLParser.py:6765(testlist_comp)
    1    0.000    0.000    0.012    0.012 PythonQLParser.py:6154(window_end_cond)
    1    0.000    0.000    0.057    0.057 PythonQLParser.py:6058(sliding_window)
    1    0.000    0.000    0.057    0.057 PythonQLParser.py:5941(window_clause)
    1   0.000    0.000    0.004   0.004 PythonQLParser.py:5807(for_clause_entry)
    1    0.000    0.000    0.020 0.020 PythonQLParser.py:5752(for_clause) 
    2/1   0.000   0.000   0.068   0.068 PythonQLParser.py:5553(query_expression)    
   48/10    0.000    0.000    0.133    0.013 PythonQLParser.py:5370(atom) 
    48/7    0.000    0.000    0.315    0.045 PythonQLParser.py:5283(power) 
    48/7    0.000    0.000    0.315    0.045 PythonQLParser.py:5212(factor) 
    48/7    0.000    0.000    0.331    0.047 PythonQLParser.py:5132(term) 
    47/7    0.000    0.000    0.346    0.049 PythonQLParser.py:5071(arith_expr) 
    47/7    0.000    0.000    0.361    0.052 PythonQLParser.py:5010(shift_expr) 
    47/7    0.000    0.000    0.376    0.054 PythonQLParser.py:4962(and_expr) 
    47/7    0.000    0.000    0.390    0.056 PythonQLParser.py:4914(xor_expr) 
    47/7    0.000    0.000    0.405    0.058 PythonQLParser.py:4866(expr) 
    44/7    0.000    0.000    0.405    0.058 PythonQLParser.py:4823(star_expr) 
    43/7    0.000    0.000    0.422    0.060 PythonQLParser.py:4615(not_test) 
    43/7    0.000    0.000    0.438    0.063 PythonQLParser.py:4563(and_test) 
    43/7    0.000    0.000    0.453    0.065 PythonQLParser.py:4509(or_test) 
    43/7    0.000    0.000    0.467    0.067 PythonQLParser.py:4293(old_test) 
    43/7    0.000    0.000    0.467    0.067 PythonQLParser.py:4179(try_catch_expr)
    43/7    0.000    0.000    0.482    0.069 PythonQLParser.py:3978(test) 
    1    0.000    0.000    0.048    0.048 PythonQLParser.py:2793(import_from) 
    1    0.000    0.000    0.048    0.048 PythonQLParser.py:2702(import_stmt) 
    7    0.000    0.000    1.728    0.247 PythonQLParser.py:2251(testlist_star_expr) 
    4    0.000    0.000    1.770    0.443 PythonQLParser.py:2161(expr_stmt) 
    5    0.000    0.000    1.822    0.364 PythonQLParser.py:2063(small_stmt) 
    5    0.000    0.000    1.855    0.371 PythonQLParser.py:1980(simple_stmt) 
    5    0.000    0.000    1.859    0.372 PythonQLParser.py:1930(stmt) 
    1    0.000    0.000    1.898    1.898 PythonQLParser.py:1085(file_input)
    176    0.002    0.000    0.993    0.006 Lexer.py:127(nextToken)
    420    0.000   0.000   0.535   0.001 ParserATNSimulator.py:1120(closure)
   705    0.003    0.000    1.642    0.002 ParserATNSimulator.py:315(adaptivePredict)

I have attached a file that parses for 7 seconds on my Macbook Pro as well.

I’d be happy to reduce this case to a minimal case for debugging, but don’t really know where to start.

The grammar doesn’t seem to have any problems like ambiguity, etc.

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:77 (36 by maintainers)

github_iconTop GitHub Comments

7reactions
amykyta3commented, Jan 9, 2020

Hey everyone. Posting here since it may be useful to people that find this issue thread.

@parrt and @thisiscam’s comments about using cython inspired me to put together an accelerator module for Antlr. It isn’t implemented using cython, but it uses the Antlr C++ target as a Python extension. Lexing & parsing is done exclusively in C++, and then an auto-generated visitor is used to re-build the resulting parse tree in Python. Initial tests show a 5x-25x speedup depending on the grammar and input, and I have a few ideas on how to improve it further.

Here is the code-generator tool: https://github.com/amykyta3/speedy-antlr-tool And here is a fully-functional example: https://github.com/amykyta3/speedy-antlr-example

It is still a work in progress, and I hope to fill in more documentation details this weekend.

2reactions
SimonStPetercommented, Jun 26, 2016

@jimidle: to say that is to say that the python target in ANTLR is functionally unusable, in which case it should be deprecated ASAP to stop other people from going down that blind alley. However, I repeat, I don’t believe it. Python is slow but not that slow. Nowhere near. Something else is wrong. I’d love to dig into it but I’m starting a new job tomorrow and I’m going to be snowed under for the foreseeable future. @pavelvelikhov: please see my suggestion about taking it up with pthon experts.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Slow ANTLR4 generated Parser in Python, but fast in Java
I think it's an error related to antlr4-python2-runtime because it only takes 1 second to parse the same file on java. Python can...
Read more >
Need help with speeding up ANTLR4 parser - Google Groups
That grammar took as much time to parse as well, and it takes milliseconds to parse with the Java target. My conclusion is...
Read more >
Antlrworks How To Work With Python 2.5 Grammar - ADocLib
This tiny program parses in milliseconds with the Java runtime but takes about 1.5 seconds in python after the recent fix before it...
Read more >
How to get current time in milliseconds in Python?
Here we use the time.time() method to get the current CPU time in seconds. The time is calculated since the epoch. It returns...
Read more >
Parsing Expressions - Crafting Interpreters
Many of us have cobbled together a mishmash of regular expressions and ... It means to take a text and map each word...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found