Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Context sensitive tokens

See original GitHub issue

Following the discussions on #993 and #1023, I’m wondering if it’d make sense to have an option for stateful tokens.

const stringStartAndEnd = createToken({
  effect: ({ insideString }) => ({ insideString: !insideString }),
  pattern: /'/
});

const literal = createToken({
  gate: ({ insideString }) => insideString,
  pattern: /[^']*/
});

For this to work, the lexer would be initiated with a state variable, an empty object at first. If present, the effect function of a token is invoked whenever encountered during lexing:

state = { ...state, ...token.effect(state)) };

The gate property of a token is called with the state variable before a match.

@bd82 does this make any sense?

Issue Analytics

State:
Created 4 years ago
Comments:11 (10 by maintainers)

Top GitHub Comments

1reaction

bd82commented, Sep 1, 2019

Perhaps we could add a section to the lexer tutorial page that mentions it?

It seems like this feature is more popular than I thought…

Having a full guide e.g:

https://sap.github.io/chevrotain/docs/guide/generating_syntax_diagrams.html#examples

Would be best but that it a fair bit of work to create…

Perhaps (as suggested) a small note could be added in the lexer tutorial to aid discover-ability.

1reaction

HoldYourWafflecommented, Sep 1, 2019

Fellow noobs should help each other out right 😉

@bd82 Maybe it’s a good idea to make multi-mode lexing more prevalent in the documentation? There isn’t really a tutorial page for it (although the linked example is plenty to understand how it works) and I didn’t find the page until you linked to it from another issue. Perhaps we could add a section to the lexer tutorial page that mentions it?

Top Results From Across the Web

The Problem of Context-Sensitive Tokenization - JavaCC 21

The solution I ended up implementing was in three parts: define the RSIGNEDSHIFT and RUNSIGNEDSHIFT tokens in a separate phony lexical state ...

5. Parser Mechanics - Stanford CS Theory

Context sensitivity decreases the separation between scanner and parser, but it is useful in parsers like IniFile, where the tokens themselves are not ......

Context sensitive lexers · wincent.com

In this way multiple tokens would be emitted, one for each greater-than symbol, with no complicated action required by either the lexer or...

Hime - Context-sensitive lexing - Cénotélie

Context -sensitive lexing is the ability for a lexer to yield different tokens depending on the context of the parser. The most common...

Context-aware multi-token concept recognition of biological ...

The key aspect of our method is utilizing the contextual ... Context-aware multi-token concept recognition of biological entities.