question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow lexer to retokenize tokens with diff of text

See original GitHub issue

Problem: When using chevrotain in an IDE/editor, calling Lexer.tokenize will throw out the previous lexer result whenever the text is updated, even if only a small portion of the text is updated.

Proposal: Add a new method to Lexer called retokenize or retokenizeRange, perhaps with a signature that looks like this retokenizeRange(prevResult: ILexingResult, range: { start: number, end: number, replacement: string }, mode: string), which takes the previous result and updates it according to the diff. The Ohm parser implements this API with m.replaceInputRange(startIdx: number, endIdx: number, str: string): https://github.com/harc/ohm/blob/master/doc/api-reference.md

Related to https://github.com/SAP/chevrotain/issues/598.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
christianvoigtcommented, Jun 14, 2018

Hi, I just wanted to share my experience so far: I am using Chevrotain for linting Markdown-like documents in a VS Code extension using a language server (https://marketplace.visualstudio.com/items?itemName=christianvoigt.argdown-vscode). Currently the language server will always validate the complete document.

The extension has already been used in philosophy seminars for analyzing argumentation. Texts of philosophy students can become quite long, but there haven’t been any problems yet. In the next version there will also be a live preview of svg graphics generated based on the code. This is also working by parsing the whole document on each change (though it is throttled). At least for my syntax, I currently don’t see the need to add partial re-parsing or re-lexing for any of these use cases.

0reactions
bd82commented, Jun 21, 2018

I suggest trying to implement this on a specific grammar and use case and only afterwards attempting to generalize it for any grammar (if possible).

Lets re-open this issue if and when

  1. You will encounter performance issues in an IDE scenario.
  2. You have implemented some kind of partial parsing similar to the description here.
  • I can help you with this if your code is open source or at least accessible.
Read more comments on GitHub >

github_iconTop Results From Across the Web

ANTLR get and split lexer content - Stack Overflow
No, there is no easy way. Since NESTED_ML_COMMENT is a lexer rule (a "simple" token), you cannot let a parser rule create any...
Read more >
Write your own lexer - Pygments
The bygroups helper yields each capturing group in the regex with a different token type. First the Name.Attribute token, then a Text token...
Read more >
Linguistic Features · spaCy Usage Documentation
The Span object acts as a sequence of tokens, so you can iterate over the entity or index into it. You can also...
Read more >
Optimizations in Syntax Highlighting, a Visual Studio Code Story
This is a technique used by many tokenization engines, including TextMate grammars, that allows an editor to retokenize only a small subset ...
Read more >
Custom Tokenization (Search Developer's Guide)
Built-in language specific rules define how to break text into tokens. ... to re-tokenize a text run, MarkLogic invokes the lexer's reset method...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found