Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parser: (re)allow duplicate group names (e.g. move the check out of the parser)

See original GitHub issue

I am currently revisiting my uxregexp extended regexp module after quite a long time. I use the regexp-tree to parse the extended regexp. However I noticed, you reject duplicate group names now, which breaks my algorithm.

Please see issue #142, where you forced group names to be unique.

As an old perl user, I am not sure, if the mentioned proposal at https://tc39.es/proposal-regexp-named-groups/#sec-patterns-static-semantics-early-errors really wants this at the end…at least, if so, I think this should change. Perl is the mother for such extended regexps and always far ahead of other implementations, meaning there were solutions for all kinds of regexp issues when others didn’t even think of these.

Example: alternative representation of a date, e.g.

   # 2020-09-21
   (?<year>  [0-9]{4} ) -
   (?<month> [0-9]{2} ) -
   (?<day>   [0-9]{2} )
   |
   # 09/21/2020
   (?<month> [0-9]{2} ) /
   (?<day>   [0-9]{2} ) /
   (?<year>  [0-9]{4} )

so, the duplicate group name is legal and quite useful, if only one matches.

Additionally, multiple matches can be collected in arrays. If you can collect with wildcards like: ((?<name> some regexp) some separator regexp)* it is only a step further to allow multiple occurances of the same group name in one expression, e.g.: ((?<name> some regexp) sep1 (?<name> some regexp) sep2)*

I also think, a check shouldn’t be in the parser, because a pure declarative syntax doesn’t include the condition. However a consumer of the AST would check the names as necessary. I also think, it is better to separate such implicit rules from the parsing process.

From my POV, I see these possibilities:

you could remove the check for a duplicate group in the parser (may be moving it to another part)
you could make it optional (because checking at the parser is more efficient)
if you reject this, I need to rename each duplicated group name (add a count), which introduces one more layer [EDIT: not really possible, I would have to change the names before parsing]

thanks for listening

PS: btw. I am not a “native” javascript developer (c++, perl, and many other languages)… So, I wonder, what would be the best way to comment on that proposal? would they listen to my comment at all?

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

hg42commented, Jan 22, 2021

thanks for the work… And also for the pointer to the proposal, I also commented there.

1reaction

DmitrySoshnikovcommented, Jan 20, 2021

@hg42, yeah, I think this would make sense for ECMAScript itself eventually, and for now we can probably add --loose-mode parse option, which would allow some some features. Alternatively these could be specific options:

parser.parse(re, {
  allowGroupNameDuplicates: boolean,
});

Top Results From Across the Web

Dmitry Soshnikov on Twitter: "Why ECMAScript spec chose checking ...

Parser : (re)allow duplicate group names (e.g. move the check out of the parser) · Issue #213 ·... I am currently revisiting my...

uniVocity parser to handle duplicate header names

How can I read a csv file which has duplicate column names by using BeanParser. Below is the example header. Col desc, Col...

4. Parsing SQL - flex & bison [Book] - O'Reilly

MySQL actually uses a bison parser to parse its SQL input, although for a ... ON DUPLICATE are recognized as single tokens; this...

GP Parser-Based Data Models - TIBCO Software

GP Parser-Based Data Models, which are data models that use a ... You must duplicate a built-in data model, save it in the...

Parsing - Datadog Docs

Parsing. Overview. Datadog automatically parses JSON-formatted logs. For other formats, Datadog allows you to enrich your logs with the help of Grok Parser....