Allow to extend newline codepoints
See original GitHub issueAs far as I can tell from https://github.com/SAP/chevrotain/blob/b3699ec61c0b41d736bc5c6783a36afb3acbdfe2/src/scan/lexer_public.ts#L588-L604, currently only CR, LF and CR LF are treated as newlines.
While this is correct for most languages, some languages like JavaScript and derivatives have extra codepoints that must be treated as newlines for location purposes:
Table 33: Line Terminator Code Points
Code Point | Unicode Name | Abbreviation |
---|---|---|
U+000A | LINE FEED (LF) | <LF> |
U+000D | CARRIAGE RETURN (CR) | <CR> |
U+2028 | LINE SEPARATOR | <LS> |
U+2029 | PARAGRAPH SEPARATOR | <PS> |
Would it be possible to make line terminators a regexp, just like whitespaces are? Then default would be \r\n?|\n
or similar, but user would be able to override it with custom chars. Or would it have significant performance implications?
Issue Analytics
- State:
- Created 6 years ago
- Comments:32 (32 by maintainers)
Top Results From Across the Web
How to add a new line in textarea element? - Stack Overflow
Run code snippet. Hide results. Expand snippet ... This way you are actually parsing the new line ("\n") rather than displaying it as...
Read more >0.31.0 - Changes to Line Terminator handling · Issue #523 - GitHub
Breaking Changes A Lexer that tracks line/column positions (default behavior) will fail to initialize if none of it's Token definitions enables the ...
Read more >Character substitution task settings - AWS Documentation
UTF‑8 ibm‑860_P100‑1995 ibm‑280_P100‑1995
UTF‑16 ibm‑861_P100‑1995 ibm‑284_P100‑1995
UTF‑16BE ibm‑862_P100‑1995 ibm‑285_P100‑1995
UTF‑16LE ibm‑863_P100‑1995 ibm‑290_P100‑1995
Read more >Newline - Wikipedia
Newline is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc.
Read more >Strings and Characters — The Swift Programming Language ...
Foundation also extends String to expose methods defined by NSString . ... let lineBreaks = """; This string starts with a line break....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Solution
We only require a subset of the regExp API to support this code block.
An faster optimized version can be written without using regExp that still conforms to the required API.
This means:
So virtually no regressions and increased customization, a win win.
Fixed and merged https://github.com/SAP/chevrotain/issues/523
Will release a new version with this later today.