Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to handle literal string that begin with keyword?

See original GitHub issue

In the sample SQL grammar in Getting Started Tutorial , some of the tokens defined as follow.

const Select = createToken({name: "Select", pattern: /SELECT/});
const Identifier = createToken({name: "Identifier", pattern: /\w+/});

However, this kind of token definition can not handle the case when there are some literal string that begin with keyword. For example, if there is one of the column in database has a name of SELECT_column1, then in SQL SELECT SELECT_column1 FROM table2 (modified from the tutorial’s source code), the string literal SELECT_column1 will be parsed as two tokens, a keyword SELECT and a string literal _column1.

I think maybe in this case, we can parse all the keyword as string literal, and then classified some of the string literal as keyword in parser. For example,

$.RULE("sql", () => {
    $.CONSUME(StringLiteral);
    //If the value of previous StringLiteral is 'SELECT', then
    $.SUBRULE($.SELECT);
    // Else if the value of previous StringLiteral is 'UPDATE', then
    $.SUBRULE($.UPDATE);
})

However, it seems that chevrotain does not support something like ‘conditional consume’ or ‘conditional subrule’.

So is there any existing way that I can use to handle this problem?

Thanks!

Issue Analytics

State:
Created 6 years ago
Comments:8 (5 by maintainers)

Top GitHub Comments

1reaction

bd82commented, Sep 19, 2017

Could you give me an small example about how to read the value of next or previous token inside GATE?

    $.RULE("customPredicateRule", function() {
       const strLit = $.CONSUME(StringLiteral);
       const strLitValue = strLit.image

        $.OR([
            {
                GATE: () => strLitValue === "SELECT",
                ALT:() => {
                    $.SUBRULE($.select)
                }
            },
            {
                GATE: () => strLitValue === "UPDATE",
                ALT: () => {
                    $.SUBRULE($.update)
                }
            }
        ])
    })

1reaction

bd82commented, Sep 19, 2017

Thanks @deltaidea it is indeed a lexer level issue, and possible to resolve using word boundaries but there is a first class support for this issue in Chevrotain without using word boundaries. see: https://github.com/SAP/chevrotain/blob/master/examples/lexer/keywords_vs_identifiers/keywords_vs_identifiers.js

The problem with word boundaries is that it searches for boundaries for the ECMAScript regexp’s definition of a word and not the identifier token in whatever language we are implementing.

I will keep this issue open as a reminder to update the tutorial to use this pattern.