question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parsimmon.index yields incorrect position for tokens right before a newline ('\n')

See original GitHub issue
const parser = P.seqMap(P.index, P.digits, P.index, function (start, value, end) {
  console.log(start, value, end);
  return value;
});

parser.parse('1234\n');
parser.parse('1234');

outputs:

{offset: 0, line: 1, column: 1} ‘1234’ {offset: 4, line: 2, column: 0} {offset: 0, line: 1, column: 1} ‘1234’ {offset: 4, line: 1, column: 5}

I suppose these two outputs should be identical.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:10

github_iconTop GitHub Comments

1reaction
hillincommented, Dec 5, 2021

This still seems to me like the ideal solution is to use inclusive ranges

To me exclusive ending index is still the legitimate way to go. We just need some kind of virtual index, which does not actually map to a character in the source. This is the standard design of almost all range constructs, the Range in javascript selection API, System.Range in .net and range() in python to name a few.

One thought is, we don’t have to change the behavior of Parsimmon.index, which semantically means the current text pointer in the scanner and should reflect how Parsimmon interprets \n (which, as you said, is breaking to change thus not desirable in v1); but we can have another function to correct mark the range of a parsed result. E.g.

function mark<T>(parser: Parser<T>): Parser<{value: T; begin: Index; end: Index}>; 
// in which the end index should be the ideal result mentioned above

Actually this could be very useful (in my use cases) because I find myself always using the Parsimmon.index in pair to mark ranges.

0reactions
hillincommented, Dec 5, 2021

The mark method already does this

Oops, I’ve missed that one.

I would certainly review the PR if you want to work on it.

I’ll see what I can do with it.

BTW for now we’ve switched to line-based parsing, as a workaround of this issue; as well as better supporting for partially tokenization (as expected by the monaco editor).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Issues · jneen/parsimmon - GitHub
Parsimmon.index yields incorrect position for tokens right before a newline ('\n'). #331 opened on Nov 30, 2021 by hillin.
Read more >
Recognize newline (\n) in text as end of sentence in Spacy
Your custom component would set token.is_start_start = True for the tokens right after newlines and leave all other tokens unmodified. Check out ...
Read more >
NKW - River Thames Conditions
14 day weather tenerife costa adeje, Total health foods, Bolerito tejido a ... Jafza warehouse rates, Tindley accelerated schools jobs, Biodigesters in ......
Read more >
LogicBlox 4 Reference Manual
Returns the bitwise right shift of the integer x by the integer y . This shifts the bits of x "lower" by y...
Read more >
2006 Session Laws - North Carolina General Assembly
day of. May, 2006. H.B. 2358. Session Law 2006-3 ... Prosecutions for offenses committed before the effective dates in this act are not....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found