Parser: (re)allow duplicate group names (e.g. move the check out of the parser)
See original GitHub issueI am currently revisiting my uxregexp extended regexp module after quite a long time. I use the regexp-tree to parse the extended regexp. However I noticed, you reject duplicate group names now, which breaks my algorithm.
Please see issue #142, where you forced group names to be unique.
As an old perl user, I am not sure, if the mentioned proposal at https://tc39.es/proposal-regexp-named-groups/#sec-patterns-static-semantics-early-errors really wants this at the end…at least, if so, I think this should change. Perl is the mother for such extended regexps and always far ahead of other implementations, meaning there were solutions for all kinds of regexp issues when others didn’t even think of these.
Example: alternative representation of a date, e.g.
# 2020-09-21
(?<year> [0-9]{4} ) -
(?<month> [0-9]{2} ) -
(?<day> [0-9]{2} )
|
# 09/21/2020
(?<month> [0-9]{2} ) /
(?<day> [0-9]{2} ) /
(?<year> [0-9]{4} )
so, the duplicate group name is legal and quite useful, if only one matches.
Additionally, multiple matches can be collected in arrays.
If you can collect with wildcards like:
((?<name> some regexp) some separator regexp)*
it is only a step further to allow multiple occurances of the same group name in one expression, e.g.:
((?<name> some regexp) sep1 (?<name> some regexp) sep2)*
I also think, a check shouldn’t be in the parser, because a pure declarative syntax doesn’t include the condition. However a consumer of the AST would check the names as necessary. I also think, it is better to separate such implicit rules from the parsing process.
From my POV, I see these possibilities:
- you could remove the check for a duplicate group in the parser (may be moving it to another part)
- you could make it optional (because checking at the parser is more efficient)
- if you reject this, I need to rename each duplicated group name (add a count), which introduces one more layer [EDIT: not really possible, I would have to change the names before parsing]
thanks for listening
PS: btw. I am not a “native” javascript developer (c++, perl, and many other languages)… So, I wonder, what would be the best way to comment on that proposal? would they listen to my comment at all?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:7 (7 by maintainers)
Top GitHub Comments
thanks for the work… And also for the pointer to the proposal, I also commented there.
@hg42, yeah, I think this would make sense for ECMAScript itself eventually, and for now we can probably add
--loose-mode
parse option, which would allow some some features. Alternatively these could be specific options: