question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Newline regex considered as zero length

See original GitHub issue

Describe the bug I’m using the Earley parser for parsing with lark. When using a regular expression consisting of a single newline \n (or together with a lookahead in the regex) - e.g. /\n(?=a)/x - lark complains that the regular expression is of zero length - which it is not. Also, the usage of other flags mliux does not change this behaviour.

To Reproduce

import re
from lark import Lark
GRAMMAR = """start: /\n(?=a)/x"""
parser = Lark(GRAMMAR, ambiguity="resolve", debug=True)
print(parser.parse("\na").pretty())

Error Message

Traceback (most recent call last):                                                                                
  File "min_work_ex.py", line 8, in <module>                                                                      
    parser = Lark(GRAMMAR, ambiguity="resolve", debug=True)                                                       
  File "/home/lol/.local/lib/python3.8/site-packages/lark/lark.py", line 377, in __init__                      
    self.parser = self._build_parser()                                                                            
  File "/home/lol/.local/lib/python3.8/site-packages/lark/lark.py", line 419, in _build_parser                 
    return parser_class(self.lexer_conf, parser_conf, options=self.options)                                       
  File "/home/lol/.local/lib/python3.8/site-packages/lark/parser_frontends.py", line 40, in __call__           
    return ParsingFrontend(lexer_conf, parser_conf, options)                                                      
  File "/home/lol/.local/lib/python3.8/site-packages/lark/parser_frontends.py", line 69, in __init__           
    self.parser = create_parser(lexer_conf, parser_conf, options)                                                 
  File "/home/lol/.local/lib/python3.8/site-packages/lark/parser_frontends.py", line 216, in create_earley_pars
er                                                                                                                
    return f(lexer_conf, parser_conf, options, resolve_ambiguity=resolve_ambiguity, debug=debug, tree_class=tree_c
lass, **extra)                                                                                                    
  File "/home/lol/.local/lib/python3.8/site-packages/lark/parser_frontends.py", line 193, in create_earley_pars
er__dynamic                                                                                                       
    earley_matcher = EarleyRegexpMatcher(lexer_conf)                                                              
  File "/home/lol/.local/lib/python3.8/site-packages/lark/parser_frontends.py", line 182, in __init__          
    raise GrammarError("Dynamic Earley doesn't allow zero-width regexps", t)                                      
lark.exceptions.GrammarError: ("Dynamic Earley doesn't allow zero-width regexps", TerminalDef('__ANON_0', '(?x:\n(
?=a))'))                                                                                                          

Lark version lark-parser==0.11.3

Question Is this in any way intended? Or is there a way around this problem? Thanks in advance for your response 😃

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
MegaIngcommented, Jun 23, 2021

The regex is Zero length. You have the x flag, which means that all Whitespace characters are ignored. The escape sequence \n will get translate to a newline before being passed to the regex engine. That means that it is ignored. There are two fixes: Don’t use the x flag, or add another backslash: \\n.

0reactions
erezshcommented, Jun 23, 2021

I don’t think so, but you’re welcome to make the argument. Preferably in a separate issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Regex starting with a zero length line and continuing across ...
Hi Bart, should have mentioned, should match from the first blank line before non-zero lines. If there are other blank lines earlier in...
Read more >
Zero-Length Regex Matches
This means that when a regex only consists of one or more anchors, word boundaries, or lookarounds, it can result in a zero-length...
Read more >
Regular expressions - Stringr's
Regular expressions are a concise and flexible tool for describing patterns in strings. This vignette describes the key features of stringr's regular ......
Read more >
Documentation: 15: 9.7. Pattern Matching - PostgreSQL
According to the SQL standard, omitting ESCAPE means there is no escape character (rather than defaulting to a backslash), and a zero-length ESCAPE...
Read more >
Pattern (Java Platform SE 7 ) - Oracle Help Center
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found