Parsimmon.index yields incorrect position for tokens right before a newline ('\n')
See original GitHub issueconst parser = P.seqMap(P.index, P.digits, P.index, function (start, value, end) {
console.log(start, value, end);
return value;
});
parser.parse('1234\n');
parser.parse('1234');
outputs:
{offset: 0, line: 1, column: 1} ‘1234’ {offset: 4, line: 2, column: 0} {offset: 0, line: 1, column: 1} ‘1234’ {offset: 4, line: 1, column: 5}
I suppose these two outputs should be identical.
Issue Analytics
- State:
- Created 2 years ago
- Comments:10
Top Results From Across the Web
Issues · jneen/parsimmon - GitHub
Parsimmon.index yields incorrect position for tokens right before a newline ('\n'). #331 opened on Nov 30, 2021 by hillin.
Read more >Recognize newline (\n) in text as end of sentence in Spacy
Your custom component would set token.is_start_start = True for the tokens right after newlines and leave all other tokens unmodified. Check out ...
Read more >NKW - River Thames Conditions
14 day weather tenerife costa adeje, Total health foods, Bolerito tejido a ... Jafza warehouse rates, Tindley accelerated schools jobs, Biodigesters in ......
Read more >LogicBlox 4 Reference Manual
Returns the bitwise right shift of the integer x by the integer y . This shifts the bits of x "lower" by y...
Read more >2006 Session Laws - North Carolina General Assembly
day of. May, 2006. H.B. 2358. Session Law 2006-3 ... Prosecutions for offenses committed before the effective dates in this act are not....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
To me exclusive ending index is still the legitimate way to go. We just need some kind of virtual index, which does not actually map to a character in the source. This is the standard design of almost all range constructs, the Range in javascript selection API, System.Range in .net and range() in python to name a few.
One thought is, we don’t have to change the behavior of
Parsimmon.index
, which semantically means the current text pointer in the scanner and should reflect how Parsimmon interprets\n
(which, as you said, is breaking to change thus not desirable in v1); but we can have another function to correct mark the range of a parsed result. E.g.Actually this could be very useful (in my use cases) because I find myself always using the
Parsimmon.index
in pair to mark ranges.Oops, I’ve missed that one.
I’ll see what I can do with it.
BTW for now we’ve switched to line-based parsing, as a workaround of this issue; as well as better supporting for partially tokenization (as expected by the monaco editor).