Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parsimmon.index yields incorrect position for tokens right before a newline ('\n')

See original GitHub issue

const parser = P.seqMap(P.index, P.digits, P.index, function (start, value, end) {
  console.log(start, value, end);
  return value;
});

parser.parse('1234\n');
parser.parse('1234');

outputs:

{offset: 0, line: 1, column: 1} ‘1234’ {offset: 4, line: 2, column: 0} {offset: 0, line: 1, column: 1} ‘1234’ {offset: 4, line: 1, column: 5}

I suppose these two outputs should be identical.

Issue Analytics

State:
Created 2 years ago
Comments:10

Top GitHub Comments

1reaction

hillincommented, Dec 5, 2021

This still seems to me like the ideal solution is to use inclusive ranges

To me exclusive ending index is still the legitimate way to go. We just need some kind of virtual index, which does not actually map to a character in the source. This is the standard design of almost all range constructs, the Range in javascript selection API, System.Range in .net and range() in python to name a few.

One thought is, we don’t have to change the behavior of Parsimmon.index, which semantically means the current text pointer in the scanner and should reflect how Parsimmon interprets \n (which, as you said, is breaking to change thus not desirable in v1); but we can have another function to correct mark the range of a parsed result. E.g.

function mark<T>(parser: Parser<T>): Parser<{value: T; begin: Index; end: Index}>; 
// in which the end index should be the ideal result mentioned above

Actually this could be very useful (in my use cases) because I find myself always using the Parsimmon.index in pair to mark ranges.

0reactions

hillincommented, Dec 5, 2021

The mark method already does this

Oops, I’ve missed that one.

I would certainly review the PR if you want to work on it.

I’ll see what I can do with it.

BTW for now we’ve switched to line-based parsing, as a workaround of this issue; as well as better supporting for partially tokenization (as expected by the monaco editor).

Top Results From Across the Web

Issues · jneen/parsimmon - GitHub

Parsimmon.index yields incorrect position for tokens right before a newline ('\n'). #331 opened on Nov 30, 2021 by hillin.

Recognize newline (\n) in text as end of sentence in Spacy

Your custom component would set token.is_start_start = True for the tokens right after newlines and leave all other tokens unmodified. Check out ...

NKW - River Thames Conditions

14 day weather tenerife costa adeje, Total health foods, Bolerito tejido a ... Jafza warehouse rates, Tindley accelerated schools jobs, Biodigesters in ......

LogicBlox 4 Reference Manual

Returns the bitwise right shift of the integer x by the integer y . This shifts the bits of x "lower" by y...

2006 Session Laws - North Carolina General Assembly

day of. May, 2006. H.B. 2358. Session Law 2006-3 ... Prosecutions for offenses committed before the effective dates in this act are not....