Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

String literals are incorrectly parsed

See original GitHub issue

subj

module.exports = '\u0009\u000A\u000B\u000C\u000D\u0020\u00A0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u202F\u205F\u3000\u2028\u2029\uFEFF';

here is the source https://raw.githubusercontent.com/zloirock/core-js/master/packages/core-js/internals/whitespaces.js

Issue Analytics

State:
Created 4 years ago
Comments:25 (20 by maintainers)

Top GitHub Comments

1reaction

JLHwungcommented, Mar 6, 2020

I see multiple technical issues involved in this thread.

`\u180e`

‘\u180e’ is not a valid whitespace.

Correct. Historically when MONGOLIAN VOWEL SEPARATOR was introduced, it was categorized as Zs (whitespace), later in 2013 it was changed to Cf (ref: https://www.unicode.org/L2/L2013/13004-vowel-sep-change.pdf) and published in Unicode version 7.0.

Unfortunately such change will need decades to sync to every downstream projects of Unicode. So please file a bug on angular that \u180e should not be included in WS_CHARS.

weird looking texts on REPL

https://user-images.githubusercontent.com/1629088/75985511-bc242680-5eec-11ea-925b-35da4da05a12.png

There are three red dots in the parsed "value" key of the string literal. They represents \u2028, \u2029 and \ufeff respectively. Meriyah REPL uses CodeMirror to pretty print the AST, which uses a \u2022 (Bullet) to represent a “special char”. So a red dot is printed.

https://github.com/codemirror/CodeMirror/blob/01758b19565384414306816b43b5f35d81f039a3/src/line/line_data.js#L122

Note that when you copy from the AST, CodeMirror will send you the raw text, so you can compare it to the escaped version on your DevTools console (Yes, chrome DevTools also uses CodeMirror)

how it can break an app

I just want to build my angular app in ES5 as I have IE11-using customers. If I use meriyah, it breaks in this single and specific way.

I have no idea how a parser can break an app without generating the app code from the parsed AST. So I guess here is the process:

source-code => (meriyah) => (generator) => production-code

For example, astring is a generator that can print estree AST (generated by meriyah) to JavaScript codes. TypeScript has builtin parser and generator. One may also have their own generator.

In this case it can break the app because there are \u2028 \u2029 in the literal. When a generator is doing something like

`var ${decl.id.name} = "${decl.init.value}"`

The generated code will break on legacy platforms because \u2028, \u2029 must be escaped in string literals prior to ES2019 (https://ecma-international.org/ecma-262/#sec-intro). Since \u2028, and \u2029 are not printed as equivalent escaped form in decl.init.value, the generator may print the unescaped characters to the source.

To preserve the raw text of the string literal, you can pass raw: true to the meriyah option, which will append a "raw" property

"init": {
  "type": "Literal",
  "value": " \f\n\r\t\u000b ᠎ -     　",
  "raw": "' \\f\\n\\r\\t\\v\\u1680\\u180e\\u2000-\\u200a\\u2028\\u2029\\u202f\\u205f\\u3000\\ufeff'"
}

The generator may print the string literal using decl.init.raw. If you are using your own generator, please revise and use decl.init.raw.

1reaction

jpike88commented, Mar 5, 2020

I’ll just make it clear as I found the original problem. All this stuff is borderline black magic so I think we all need to take a step back and appreciate for a second how hard this shit is and how big brainEd we all are. It’s basically computer science. Coming from a lowly angular developer.

I just want to build my angular app in ES5 as I have IE11-using customers. If I use meriyah, it breaks in this single and specific way. If I use ts, it builds fine but much slower. Can we focus on just solving this and moving forward pls

Top Results From Across the Web

Incorrect parsing of string literals prefixed with "u8"

I can confirm this bug. For C++17 std language level and the v14.27 compiler toolkit, string literals with non-ASCII characters in a UTF-8...

Problem parsing unicode escape in a Java 6 String literal...?

The problem is that the Unicode replacement is done very early in compilation. Unicode escapes aren't just valid in strings and character ...

CA2243: Attribute string literals should parse correctly

Cause. An attribute's string literal parameter does not parse correctly for a URL, GUID, or Version. Rule description.

SyntaxError: unterminated string literal - JavaScript | MDN

The JavaScript error "unterminated string literal" occurs when there is an unterminated string literal somewhere. String literals must be enclosed by single ...

Raw string literals parsed incorrectly : RSCPP-17887 - YouTrack

Raw string literals parsed incorrectly ... Expectation: No errors. In fact, if you build the project it will compile fine, even though Resharper...