[Performance] [Python] Use bitwise check instead of in operator within list
See original GitHub issueGrammar
e0: e2 | T0;
e2: T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9 | T10 | T11 | T12 | T13 | T14 | T15;
T0: 't0';
T1: 't1';
T2: 't2';
T3: 't3';
T4: 't4';
T5: 't5';
T6: 't6';
T7: 't7';
T8: 't8';
T9: 't9';
T10: 't10';
T11: 't11';
T12: 't12';
T13: 't13';
T14: 't14';
T15: 't15';
Generated code
ANTLR generates the following code for e0
:
def e0(self):
localctx = TestParser.E0Context(self, self._ctx, self.state)
self.enterRule(localctx, 2, self.RULE_e0)
try:
self.state = 10
self._errHandler.sync(self)
token = self._input.LA(1)
if token in [TestParser.T1, TestParser.T2, TestParser.T3, TestParser.T4, TestParser.T5, TestParser.T6, TestParser.T7, TestParser.T8, TestParser.T9, TestParser.T10, TestParser.T11, TestParser.T12, TestParser.T13, TestParser.T14, TestParser.T15]:
self.enterOuterAlt(localctx, 1)
self.state = 8
self.e2()
pass
elif token in [TestParser.T0]:
self.enterOuterAlt(localctx, 2)
self.state = 9
self.match(TestParser.T0)
pass
else:
raise NoViableAltException(self)
except RecognitionException as re:
localctx.exception = re
self._errHandler.reportError(self, re)
self._errHandler.recover(self, re)
finally:
self.exitRule()
return localctx
The problem
The line if token in [TestParser.T1, TestParser.T2, TestParser.T3, TestParser.T4, TestParser.T5, TestParser.T6, TestParser.T7, TestParser.T8, TestParser.T9, TestParser.T10, TestParser.T11, TestParser.T12, TestParser.T13, TestParser.T14, TestParser.T15]
is far from efficient because it allocates list on every e0
call and it has O(N) complexity that depends on tokens count. The more tokens to check the slower code we have.
It’s quite common case and it’s especially important for Python because Python doesn’t support switch case
construction and it’s unable to optimize the code during intepretation. Maybe bitwise check also efficient for other runtimes as well but ANTLR generates switch case
for them and it should be optimized by their compilers.
Solution
It can be replaced by bitwise checking, something like (1 << token) & 0xFFFF != 0
. The similar check we have for TestSetInline
.
Benchmark
https://gist.github.com/KvanTTT/e3b355f7e321fe7f52e11ea1aa0ecbce#file-check-range-vs-mask-py
check_by_if_test: 438 ns
check_by_range_test: 619 ns
check_by_mask_test: 202 ns
Issue Analytics
- State:
- Created a year ago
- Comments:8 (5 by maintainers)
Top GitHub Comments
Well, yes. Compared to 15×2 = 30 dereferences, I was expecting one or two orders of magnitude, not just 1/3 of the time.
Yeah… a terrible terrible terrible idea with Python where there is no constant, thus no optimization possible.
Created a patch script that implements this optimization: https://gist.github.com/amykyta3/8285559e95074c4431c2836d78b36530