question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Supplementary plane error due to a character I'm not actually using explicitly in my regex.

See original GitHub issue

I have the following regex in my grammar file:

[\t\n\r\u0020-\uD7FF\uE000\uFFFD\u10000-\u10FFFF]

And I get the following error due to the regex above:

Error: You have Unicode Supplementary Plane content in a regex set: JavaScript has severe problems with Supplementary Plane content, particularly in regexes, so you are kindly required to get rid of this stuff. Sorry! (Offending UCS-2 code which triggered this: 0xd800)

I know the regex is the source of the error because if I remove it, then everything is fine. Specifically, the problem is \u0020-\uD7FF.

Looking at the code in regexp-lexer.js, I’ve deduced that the problem occurs when jison-gho computes an inverted character set when it tries optimizing the regular expression. When it computes the inverted set, one of the range boundaries is 0xD7FF + 1, and the error is triggered.

I can understand complaining about a user-written regular expression that goes into the supplementary plane, but here we’re talking about a regular expression that is computed behind the scenes. Should there even be an error raised on the inverted set which is computed internally?

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
GerHobbeltcommented, Mar 8, 2017

By the way @lddubeau : do note that you seem to use Supplementary Plane Unicode though, due to this bit of regex: \u10000-\u10FFFF, which then should be written as \u{10000}-\u{10FFFF} (see also: https://rainsoft.io/what-every-javascript-developer-should-know-about-unicode/ )

However, this is currently a moot point as JISON doesn’t support Astral Plane Unicode Codepoints (a.k.a. Supplementary Plane Characters, i.e. anything above U+FFFF). I’m looking into supporting ES2015 regex /u flag though, but that’s future noise.

For now I’ll first create a new patch release to mark the fixing of all the other bugs your issue report helped uncover! 👍

0reactions
GerHobbeltcommented, Mar 14, 2017

FYI: this will take a while to fix; I’ve got little spare time ATM and this needs some internal rework to work correctly for the entire Unicode range.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Powershell regex how to prevent match when additional ...
I cannot simply truncate the input as the script is validating existing data and needs to highlight the error. Is there a way...
Read more >
4. Pattern Matching with Regular Expressions - Java ... - O'Reilly
In Figure 4-1, I typed qu into the REDemo program's Pattern box, which is a syntactically valid regex pattern: any ordinary characters stand...
Read more >
Regular Expressions Clearly Explained with Examples
This blog post was born out of my own frustration and avoidance of the topic of regular expression (regex) for the longest time....
Read more >
Supplementary Characters in the Java Platform - Oracle
Learn how supplementary characters are supported in the Java platform, and how to make your application ready to support them.
Read more >
Regular Expression (Regex) Tutorial
Regular Expression, or regex or regexp in short, is extremely and amazingly powerful in searching and manipulating text strings, particularly in processing ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found