question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow to extend newline codepoints

See original GitHub issue

As far as I can tell from https://github.com/SAP/chevrotain/blob/b3699ec61c0b41d736bc5c6783a36afb3acbdfe2/src/scan/lexer_public.ts#L588-L604, currently only CR, LF and CR LF are treated as newlines.

While this is correct for most languages, some languages like JavaScript and derivatives have extra codepoints that must be treated as newlines for location purposes:

Table 33: Line Terminator Code Points

Code Point Unicode Name Abbreviation
U+000A LINE FEED (LF) <LF>
U+000D CARRIAGE RETURN (CR) <CR>
U+2028 LINE SEPARATOR <LS>
U+2029 PARAGRAPH SEPARATOR <PS>

Would it be possible to make line terminators a regexp, just like whitespaces are? Then default would be \r\n?|\n or similar, but user would be able to override it with custom chars. Or would it have significant performance implications?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:32 (32 by maintainers)

github_iconTop GitHub Comments

1reaction
bd82commented, Jun 29, 2017

Solution

We only require a subset of the regExp API to support this code block.

nlRegExpLike.lastIndex = 0
do {
nl = nlRegExpLike.exec(matchedImage)
if (nl !== null) {
      lastLTIdx = nl.index
      numOfLTsInMatch++
   }
} while (nl)

An faster optimized version can be written without using regExp that still conforms to the required API.

  • Example Source
  • Tested this optimization and at most it causes a small (1-2%) degradation.

This means:

  • The default built-in implementation for new lines can be implemented using the optimized form.
  • Users can still supply their own regExp literals to override the default but at a performance cost.
  • Advanced users can also write custom implementations using the optimized form.

So virtually no regressions and increased customization, a win win.

0reactions
bd82commented, Jul 1, 2017

Fixed and merged https://github.com/SAP/chevrotain/issues/523

Will release a new version with this later today.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to add a new line in textarea element? - Stack Overflow
Run code snippet. Hide results. Expand snippet ... This way you are actually parsing the new line ("\n") rather than displaying it as...
Read more >
0.31.0 - Changes to Line Terminator handling · Issue #523 - GitHub
Breaking Changes A Lexer that tracks line/column positions (default behavior) will fail to initialize if none of it's Token definitions enables the ...
Read more >
Character substitution task settings - AWS Documentation
UTF‑8 ibm‑860_P100‑1995 ibm‑280_P100‑1995 UTF‑16 ibm‑861_P100‑1995 ibm‑284_P100‑1995 UTF‑16BE ibm‑862_P100‑1995 ibm‑285_P100‑1995 UTF‑16LE ibm‑863_P100‑1995 ibm‑290_P100‑1995
Read more >
Newline - Wikipedia
Newline is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc.
Read more >
Strings and Characters — The Swift Programming Language ...
Foundation also extends String to expose methods defined by NSString . ... let lineBreaks = """; This string starts with a line break....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found