Allow lexer to retokenize tokens with diff of text
See original GitHub issueProblem:
When using chevrotain in an IDE/editor, calling Lexer.tokenize
will throw out the previous lexer result whenever the text is updated, even if only a small portion of the text is updated.
Proposal:
Add a new method to Lexer
called retokenize
or retokenizeRange
, perhaps with a signature that looks like this retokenizeRange(prevResult: ILexingResult, range: { start: number, end: number, replacement: string }, mode: string)
, which takes the previous result and updates it according to the diff.
The Ohm parser implements this API with m.replaceInputRange(startIdx: number, endIdx: number, str: string)
:
https://github.com/harc/ohm/blob/master/doc/api-reference.md
Related to https://github.com/SAP/chevrotain/issues/598.
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
ANTLR get and split lexer content - Stack Overflow
No, there is no easy way. Since NESTED_ML_COMMENT is a lexer rule (a "simple" token), you cannot let a parser rule create any...
Read more >Write your own lexer - Pygments
The bygroups helper yields each capturing group in the regex with a different token type. First the Name.Attribute token, then a Text token...
Read more >Linguistic Features · spaCy Usage Documentation
The Span object acts as a sequence of tokens, so you can iterate over the entity or index into it. You can also...
Read more >Optimizations in Syntax Highlighting, a Visual Studio Code Story
This is a technique used by many tokenization engines, including TextMate grammars, that allows an editor to retokenize only a small subset ...
Read more >Custom Tokenization (Search Developer's Guide)
Built-in language specific rules define how to break text into tokens. ... to re-tokenize a text run, MarkLogic invokes the lexer's reset method...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi, I just wanted to share my experience so far: I am using Chevrotain for linting Markdown-like documents in a VS Code extension using a language server (https://marketplace.visualstudio.com/items?itemName=christianvoigt.argdown-vscode). Currently the language server will always validate the complete document.
The extension has already been used in philosophy seminars for analyzing argumentation. Texts of philosophy students can become quite long, but there haven’t been any problems yet. In the next version there will also be a live preview of svg graphics generated based on the code. This is also working by parsing the whole document on each change (though it is throttled). At least for my syntax, I currently don’t see the need to add partial re-parsing or re-lexing for any of these use cases.
I suggest trying to implement this on a specific grammar and use case and only afterwards attempting to generalize it for any grammar (if possible).
Lets re-open this issue if and when