Working grammar on Lark 0.9.0 throws IndexError on post-0.9.0
See original GitHub issueDescribe the bug
My parser runs on Lark 0.9.0, but throws IndexError on both Lark 0.10.1 and master branch. The parser is Earley and uses the same custom indenter as for Python examples.
I reported this bug earlier in Gitter, sharing a stack dump and some local var values, but did not yet have a reduced example. Now I do:
To Reproduce
Run with python <filename>
:
"""
IndexError_ChildFilterLALR_NoPlaceholders_2020-11.py
Works in Lark 0.9, but not 0.10.1 or github master on 2020-11-12
Python 3.8.5 64-bit on Win 10 Pro
python IndexError_ChildFilterLALR_NoPlaceholders_2020-11.py
Expected output is:
code: ...
parser: ...
tree: ...
Done.
Works on Lark 0.9.0
On Lark 0.10.1 or latest master branch, raises IndexError:
"""
import lark
import lark.indenter
code = '''\
if a is b
print(a)
'''
# improper subset of one of the Python grammar examples:
grammar = r"""
file_input: (_NEWLINE | stmt)*
?stmt: simple_stmt | compound_stmt
?simple_stmt: small_stmt (";" small_stmt)* [";"] _NEWLINE
?small_stmt: (NAME test)
?compound_stmt: if_stmt
if_stmt: "if" test ":"? suite
suite: (","? simple_stmt) | (_NEWLINE _INDENT stmt+ _DEDENT)
?test: binary_bool_test
?binary_bool_test: not_test (("and" | "or") not_test)*
?not_test: "not" not_test -> not | comparison
?comparison: expr (_comp_op expr)*
star_expr: "*" expr
?expr: shift_expr (("bor" | "band" | "bxor") shift_expr)*
?shift_expr: arith_expr (_shift_op arith_expr)*
?arith_expr: term (_add_op term)*
?term: factor (_mul_op factor)*
?factor: _factor_op factor | atom_expr
_factor_op: "+" | "-"
_add_op: "+" | "-"
_shift_op: "<<" | ">>"
_mul_op: "*" | "/" | "%" | "//"
_comp_op: "<" | ">" | "==" | ">=" | "<=" | "<>" | "in" | "is"
?atom_expr: atom_expr "(" [arguments] ")" -> func_call
| atom
?atom: "(" test ")"
| name
| number
name: NAME
?number: INT_LITERAL_DEC
?testlist_comp: (test|star_expr) [("," (test|star_expr))+ [","] | ","]
exprlist: (expr|star_expr) ("," (expr|star_expr))* [","]
testlist: test ("," test)* [","]
arguments: argvalue ("," argvalue)* ("," [starargs | kwargs])?
| starargs
| kwargs
starargs: "*" test ("," "*" test)* ("," argvalue)* ["," kwargs]
kwargs: "**" test
?argvalue: test ("=" test)?
NAME: /[a-zA-Z_]\w*/
_NEWLINE: ( /\r?\n[\t ]*/ )+
INT_LITERAL_DEC: /0|[1-9]\d*/i
%ignore /[\t \f]+/ // WS
%ignore /\\[\t \f]*\r?\n/ // LINE_CONT
%declare _INDENT _DEDENT
"""
# for Pythonic indentation:
class CustomIndenter(lark.indenter.Indenter):
NL_type = '_NEWLINE'
OPEN_PAREN_types = ['LPAR', 'LSQB', 'LBRACE']
CLOSE_PAREN_types = ['RPAR', 'RSQB', 'RBRACE']
INDENT_type = '_INDENT'
DEDENT_type = '_DEDENT'
tab_len = 8
def parser():
return lark.Lark(
grammar,
debug=True,
parser='earley', # lalr, earley
lexer='standard', # in ('standard', 'contextual', 'dynamic', 'dynamic_complete') or issubclass(lexer, Lexer)
postlex=CustomIndenter(),
start='file_input',
keep_all_tokens=True,
maybe_placeholders=True,
propagate_positions=True,
ambiguity='resolve') # in ('resolve', 'explicit', 'auto')
print('code:')
print(code)
p = parser()
print('parser:', p)
tree = p.parse(code)
print('tree:', tree)
print('Done.')
The exception:
Traceback (most recent call last):
File "IndexError_ChildFilterLALR_NoPlaceholders_2020-11.py", line 104, in <module>
tree = p.parse(code)
File "...\lark-github\lark\lark.py", line 494, in parse
return self.parser.parse(text, start=start)
File "...\lark-github\lark\parser_frontends.py", line 138, in parse
return self._parse(start, self.make_lexer(text))
File "...\lark-github\lark\parser_frontends.py", line 73, in _parse
return self.parser.parse(input, start, *args)
File "...\lark-github\lark\parsers\earley.py", line 320, in parse
return transformer.transform(solutions[0])
File "...\lark-github\lark\parsers\earley_forest.py", line 353, in transform
self.visit(root)
File "...\lark-github\lark\parsers\earley_forest.py", line 295, in visit
vpno(current)
File "...\lark-github\lark\parsers\earley_forest.py", line 587, in visit_packed_node_out
super(ForestToParseTree, self).visit_packed_node_out(node)
File "...\lark-github\lark\parsers\earley_forest.py", line 417, in visit_packed_node_out
transformed = self.transform_packed_node(node, self.data[id(node)])
File "...\lark-github\lark\parsers\earley_forest.py", line 570, in transform_packed_node
return self._call_rule_func(node, children)
File "...\lark-github\lark\parsers\earley_forest.py", line 524, in _call_rule_func
return self.callbacks[node.rule](data)
File "...\lark-github\lark\parse_tree_builder.py", line 29, in __call__
res = self.node_builder(children)
File "...\lark-github\lark\parse_tree_builder.py", line 129, in __call__
filtered.append(children[i])
IndexError: list index out of range
Issue Analytics
- State:
- Created 3 years ago
- Comments:15 (10 by maintainers)
Top Results From Across the Web
Does "IndexError: list index out of range" when trying to ...
The IndexError is raised when you attempt to retrieve an index from a sequence, like a list or a tuple , and the...
Read more >How to Fix IndexError in Python - Rollbar
The IndexError in Python occurs when an item from a list is attempted to be accessed that is outside the index range of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I have bisected the issue to 9db869c. I am close to a fix and will make a PR soon.
Thank you, everybody. Lark gets better and better.