Newline regex considered as zero length
See original GitHub issueDescribe the bug
I’m using the Earley parser for parsing with lark.
When using a regular expression consisting of a single newline \n
(or together with a lookahead in the regex) - e.g. /\n(?=a)/x
- lark complains that the regular expression is of zero length - which it is not.
Also, the usage of other flags mliux
does not change this behaviour.
To Reproduce
import re
from lark import Lark
GRAMMAR = """start: /\n(?=a)/x"""
parser = Lark(GRAMMAR, ambiguity="resolve", debug=True)
print(parser.parse("\na").pretty())
Error Message
Traceback (most recent call last):
File "min_work_ex.py", line 8, in <module>
parser = Lark(GRAMMAR, ambiguity="resolve", debug=True)
File "/home/lol/.local/lib/python3.8/site-packages/lark/lark.py", line 377, in __init__
self.parser = self._build_parser()
File "/home/lol/.local/lib/python3.8/site-packages/lark/lark.py", line 419, in _build_parser
return parser_class(self.lexer_conf, parser_conf, options=self.options)
File "/home/lol/.local/lib/python3.8/site-packages/lark/parser_frontends.py", line 40, in __call__
return ParsingFrontend(lexer_conf, parser_conf, options)
File "/home/lol/.local/lib/python3.8/site-packages/lark/parser_frontends.py", line 69, in __init__
self.parser = create_parser(lexer_conf, parser_conf, options)
File "/home/lol/.local/lib/python3.8/site-packages/lark/parser_frontends.py", line 216, in create_earley_pars
er
return f(lexer_conf, parser_conf, options, resolve_ambiguity=resolve_ambiguity, debug=debug, tree_class=tree_c
lass, **extra)
File "/home/lol/.local/lib/python3.8/site-packages/lark/parser_frontends.py", line 193, in create_earley_pars
er__dynamic
earley_matcher = EarleyRegexpMatcher(lexer_conf)
File "/home/lol/.local/lib/python3.8/site-packages/lark/parser_frontends.py", line 182, in __init__
raise GrammarError("Dynamic Earley doesn't allow zero-width regexps", t)
lark.exceptions.GrammarError: ("Dynamic Earley doesn't allow zero-width regexps", TerminalDef('__ANON_0', '(?x:\n(
?=a))'))
Lark version lark-parser==0.11.3
Question Is this in any way intended? Or is there a way around this problem? Thanks in advance for your response 😃
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (6 by maintainers)
Top Results From Across the Web
Regex starting with a zero length line and continuing across ...
Hi Bart, should have mentioned, should match from the first blank line before non-zero lines. If there are other blank lines earlier in...
Read more >Zero-Length Regex Matches
This means that when a regex only consists of one or more anchors, word boundaries, or lookarounds, it can result in a zero-length...
Read more >Regular expressions - Stringr's
Regular expressions are a concise and flexible tool for describing patterns in strings. This vignette describes the key features of stringr's regular ......
Read more >Documentation: 15: 9.7. Pattern Matching - PostgreSQL
According to the SQL standard, omitting ESCAPE means there is no escape character (rather than defaulting to a backslash), and a zero-length ESCAPE...
Read more >Pattern (Java Platform SE 7 ) - Oracle Help Center
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The regex is Zero length. You have the
x
flag, which means that all Whitespace characters are ignored. The escape sequence\n
will get translate to a newline before being passed to the regex engine. That means that it is ignored. There are two fixes: Don’t use thex
flag, or add another backslash:\\n
.I don’t think so, but you’re welcome to make the argument. Preferably in a separate issue.