question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Working grammar on Lark 0.9.0 throws IndexError on post-0.9.0

See original GitHub issue

Describe the bug

My parser runs on Lark 0.9.0, but throws IndexError on both Lark 0.10.1 and master branch. The parser is Earley and uses the same custom indenter as for Python examples.

I reported this bug earlier in Gitter, sharing a stack dump and some local var values, but did not yet have a reduced example. Now I do:

To Reproduce

Run with python <filename>:

https://pastebin.com/HMA4kXyz

"""
IndexError_ChildFilterLALR_NoPlaceholders_2020-11.py

Works in Lark 0.9, but not 0.10.1 or github master on 2020-11-12
Python 3.8.5 64-bit on Win 10 Pro

python IndexError_ChildFilterLALR_NoPlaceholders_2020-11.py

Expected output is:
    code: ...
    parser: ...
    tree: ...
    Done.

Works on Lark 0.9.0
On Lark 0.10.1 or latest master branch, raises IndexError:
"""
import lark
import lark.indenter
code = '''\
if a is b
    print(a)
'''
# improper subset of one of the Python grammar examples:
grammar = r"""
file_input: (_NEWLINE | stmt)*

?stmt: simple_stmt | compound_stmt
?simple_stmt: small_stmt (";" small_stmt)* [";"] _NEWLINE
?small_stmt: (NAME test)

?compound_stmt: if_stmt
if_stmt: "if" test ":"? suite
suite: (","? simple_stmt) | (_NEWLINE _INDENT stmt+ _DEDENT)

?test: binary_bool_test
?binary_bool_test: not_test (("and" | "or") not_test)*
?not_test: "not" not_test -> not | comparison
?comparison: expr (_comp_op expr)*
star_expr: "*" expr
?expr: shift_expr (("bor" | "band" | "bxor") shift_expr)*
?shift_expr: arith_expr (_shift_op arith_expr)*
?arith_expr: term (_add_op term)*
?term: factor (_mul_op factor)*
?factor: _factor_op factor | atom_expr

_factor_op: "+" | "-"
_add_op:    "+" | "-"
_shift_op:  "<<" | ">>"
_mul_op:    "*" | "/" | "%" | "//"
_comp_op:   "<" | ">" | "==" | ">=" | "<=" | "<>" | "in" | "is"

?atom_expr: atom_expr "(" [arguments] ")" -> func_call
          | atom

?atom: "(" test ")"
     | name
     | number

name: NAME
?number: INT_LITERAL_DEC

?testlist_comp: (test|star_expr) [("," (test|star_expr))+ [","] | ","]
exprlist: (expr|star_expr) ("," (expr|star_expr))* [","]
testlist: test ("," test)* [","]

arguments: argvalue ("," argvalue)*  ("," [starargs | kwargs])?
         | starargs
         | kwargs

starargs: "*" test ("," "*" test)* ("," argvalue)* ["," kwargs]
kwargs: "**" test

?argvalue: test ("=" test)?

NAME: /[a-zA-Z_]\w*/
_NEWLINE: ( /\r?\n[\t ]*/  )+
INT_LITERAL_DEC:   /0|[1-9]\d*/i

%ignore /[\t \f]+/  // WS
%ignore /\\[\t \f]*\r?\n/   // LINE_CONT
%declare _INDENT _DEDENT
"""

# for Pythonic indentation:
class CustomIndenter(lark.indenter.Indenter):
    NL_type = '_NEWLINE'
    OPEN_PAREN_types = ['LPAR', 'LSQB', 'LBRACE']
    CLOSE_PAREN_types = ['RPAR', 'RSQB', 'RBRACE']
    INDENT_type = '_INDENT'
    DEDENT_type = '_DEDENT'
    tab_len = 8

def parser():
    return lark.Lark(
        grammar,
        debug=True,
        parser='earley',      # lalr, earley
        lexer='standard',     # in ('standard', 'contextual', 'dynamic', 'dynamic_complete') or issubclass(lexer, Lexer)
        postlex=CustomIndenter(),
        start='file_input',
        keep_all_tokens=True,
        maybe_placeholders=True,
        propagate_positions=True,
        ambiguity='resolve')  # in ('resolve', 'explicit', 'auto')

print('code:')
print(code)

p = parser()
print('parser:', p)
tree = p.parse(code)
print('tree:', tree)

print('Done.')

The exception:

Traceback (most recent call last):
  File "IndexError_ChildFilterLALR_NoPlaceholders_2020-11.py", line 104, in <module>
    tree = p.parse(code)
  File "...\lark-github\lark\lark.py", line 494, in parse
    return self.parser.parse(text, start=start)
  File "...\lark-github\lark\parser_frontends.py", line 138, in parse
    return self._parse(start, self.make_lexer(text))
  File "...\lark-github\lark\parser_frontends.py", line 73, in _parse
    return self.parser.parse(input, start, *args)
  File "...\lark-github\lark\parsers\earley.py", line 320, in parse
    return transformer.transform(solutions[0])
  File "...\lark-github\lark\parsers\earley_forest.py", line 353, in transform
    self.visit(root)
  File "...\lark-github\lark\parsers\earley_forest.py", line 295, in visit
    vpno(current)
  File "...\lark-github\lark\parsers\earley_forest.py", line 587, in visit_packed_node_out
    super(ForestToParseTree, self).visit_packed_node_out(node)
  File "...\lark-github\lark\parsers\earley_forest.py", line 417, in visit_packed_node_out
    transformed = self.transform_packed_node(node, self.data[id(node)])
  File "...\lark-github\lark\parsers\earley_forest.py", line 570, in transform_packed_node
    return self._call_rule_func(node, children)
  File "...\lark-github\lark\parsers\earley_forest.py", line 524, in _call_rule_func
    return self.callbacks[node.rule](data)
  File "...\lark-github\lark\parse_tree_builder.py", line 29, in __call__
    res = self.node_builder(children)
  File "...\lark-github\lark\parse_tree_builder.py", line 129, in __call__
    filtered.append(children[i])
IndexError: list index out of range

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:15 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
chanicpaniccommented, Nov 13, 2020

I have bisected the issue to 9db869c. I am close to a fix and will make a PR soon.

1reaction
charles-esterbrookcommented, Nov 14, 2020

Thank you, everybody. Lark gets better and better.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Does "IndexError: list index out of range" when trying to ...
The IndexError is raised when you attempt to retrieve an index from a sequence, like a list or a tuple , and the...
Read more >
How to Fix IndexError in Python - Rollbar
The IndexError in Python occurs when an item from a list is attempted to be accessed that is outside the index range of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found