How to handle literal string that begin with keyword?
See original GitHub issueIn the sample SQL grammar in Getting Started Tutorial , some of the tokens defined as follow.
const Select = createToken({name: "Select", pattern: /SELECT/});
const Identifier = createToken({name: "Identifier", pattern: /\w+/});
However, this kind of token definition can not handle the case when there are some literal string that begin with keyword. For example, if there is one of the column in database has a name of SELECT_column1
, then in SQL SELECT SELECT_column1 FROM table2
(modified from the tutorial’s source code), the string literal SELECT_column1
will be parsed as two tokens, a keyword SELECT
and a string literal _column1
.
I think maybe in this case, we can parse all the keyword as string literal, and then classified some of the string literal as keyword in parser. For example,
$.RULE("sql", () => {
$.CONSUME(StringLiteral);
//If the value of previous StringLiteral is 'SELECT', then
$.SUBRULE($.SELECT);
// Else if the value of previous StringLiteral is 'UPDATE', then
$.SUBRULE($.UPDATE);
})
However, it seems that chevrotain does not support something like ‘conditional consume’ or ‘conditional subrule’.
So is there any existing way that I can use to handle this problem?
Thanks!
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (5 by maintainers)
Top GitHub Comments
Thanks @deltaidea it is indeed a lexer level issue, and possible to resolve using word boundaries but there is a first class support for this issue in Chevrotain without using word boundaries. see: https://github.com/SAP/chevrotain/blob/master/examples/lexer/keywords_vs_identifiers/keywords_vs_identifiers.js
The problem with word boundaries is that it searches for boundaries for the ECMAScript regexp’s definition of a word and not the identifier token in whatever language we are implementing.
I will keep this issue open as a reminder to update the tutorial to use this pattern.