question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Single-token deletion error node not in parse tree

See original GitHub issue

Hi there,

When single-token deletion kicks in and skips a token in a rule with alternative labels (it entered the rule but didn’t find the alternative yet) the error node that is created is not attached to the parse tree. If the extra token is present after a valid rule alternative is found then the error node representing the token is correctly present in the parse tree.

This is due to the ParseRuleContext.CopyFrom method not copying children (at least error nodes). See the comments on the generated code:

private VariableExprContext variableExpr(int _p) throws RecognitionException {
   ParserRuleContext _parentctx = _ctx;
   int _parentState = getState();
   VariableExprContext _localctx = new VariableExprContext(_ctx, _parentState); // Base class of alt rule with labels (correct alt not found yet).
   VariableExprContext _prevctx = _localctx;
   int _startState = 6;
   enterRecursionRule(_localctx, 6, RULE_variableExpr, _p);
   int _la;
   try {
      int _alt;
      enterOuterAlt(_localctx, 1);
      {
      setState(36);
      _errHandler.sync(this); // Single-token deletion skips extra token and adds an error node to the children of _localctx (VariableExprContext).
      switch ( getInterpreter().adaptivePredict(_input,3,_ctx) ) {  // A valid alternative is found.
      case 1:
         {
         _localctx = new LiteralExprContext(_localctx); // Simple fields(properties are copied here but not children (which include the sinlge-token deletion error node).
         _ctx = _localctx;
         _prevctx = _localctx;

         setState(33);
         literal();
         }
         break;

Some more context and a concrete example: We are using Antlr 4.5.3 to build a proper grammar/parser for an existing template engine. Target languages are C# and java but we have been working mainly with C# so far. We have both a lexer and parser files to use Antlr modes. We are writing and testing the grammar using the Antlr plugin for IntelliJ. Note that for this issue the IntelliJ plugin is misleading as it shows the missing error node in the tree which is not the case for the TestRig (-gui) (parse tree vs listener issue??).

In our template engine, dynamic expressions in text must be inside ${ expr } . Expressions mainly support basic operators, a few functions, identifiers and number literals. Our lexer file has a catch-all rule to match any unrecognized character and we skip (hidden channel) whitespace characters.

So expression ${abc} is valid. Expression ${ ? abc ? } with 2 single invalid ‘?’ tokens before and after a valid labeled alternative produces a parse tree without the error node for the first token. See the tree for ${ ? abc ? } : java_2016-09-29_15-46-58

Please note that to reproduce the issue with this simpler version of our grammar I had to generate it with the -xForce-atn option, otherwise it was generating LL1 code which doesn’t call errHandler.sync(this) before finding alternative (see the string template for C# extracts below for LL1 and LL). Which leads me to another issue/question: _why does LL1 generated code not do single-token deletion?*

https://github.com/antlr/antlr4/blob/master/tool/resources/org/antlr/v4/tool/templates/codegen/CSharp/CSharp.stg Line 597

// LL(*) stuff
AltBlock(choice, preamble, alts, error) ::= <<
State = <choice.stateNumber>;
ErrorHandler.Sync(this); // CALLING THE ERROR RECOVERY STRATEGY?
<if(choice.label)><labelref(choice.label)> = TokenStream.Lt(1);<endif>
<preamble; separator="\n">
switch ( Interpreter.AdaptivePredict(TokenStream,<choice.decision>,Context) ) {
<alts:{alt |
case <i>:
<alt>
break;}; separator="\n">
}
>>

Line 538

LL1AltBlock(choice, preamble, alts, error) ::= <<
State = <choice.stateNumber>; // NO CALL TO ErrorHandler.Sync(this);
<if(choice.label)><labelref(choice.label)> = TokenStream.Lt(1);<endif>
<preamble; separator="\n">
switch (TokenStream.La(1)) {
<choice.altLook,alts:{look,alt| <cases(ttypes=look)>
<alt>
break;}; separator="\n">
default:
<error>
}
>>

One more issue/question: if no rule alternative is found (e.g. with ${???} ), the node that will be created on the parse tree will be of type “alternative” base class (here VariableExprContext). But there is no overload in the generated visitor to visit this class. I understand it could be a bit confusing at first but still I think it makes sense to have it (or find a way to make this base class abstract which could solve both the visitor and missing node issues).

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:1
  • Comments:21 (21 by maintainers)

github_iconTop GitHub Comments

2reactions
lionelplessiscommented, Nov 8, 2016

hi @parrt, 4.6 doesn’t fix the tree. It makes it easier to create a simple grammar though (thanks to the sync() call in LL(1)). So here goes:

grammar TestGrammar;

r : '${' v '}' ;

v : A #altA
  | B #altB
  ;

A : 'a' ;
B : 'b' ;

WHITESPACE : [ \n\t\r]+ -> channel(HIDDEN) ;

ERROR : . ;

And test input: ${ ? a ?}

Gives correct error messages:

line 1:3 extraneous input '?' expecting {'a', 'b'}
line 1:7 extraneous input '?' expecting '}'

But the tree still doesn’t have the first ? as error node: (r ${ (v a) ? }).

See my comments inline the Antlr generated code below:

public final VContext v() throws RecognitionException {
        VContext _localctx = new VContext(_ctx, getState());
        enterRule(_localctx, 2, RULE_v);
        try {
            setState(10);
            _errHandler.sync(this); // Single-token deletion skips extra token and adds an error node to the children of _localctx (VContext).
            switch (_input.LA(1)) {
            case A:
                _localctx = new AltAContext(_localctx); // Valid alternative found. Children of initial _localctx are lost.
                enterOuterAlt(_localctx, 1);
                {
                setState(8);
                match(A);
                }
                break;
            case B:
                _localctx = new AltBContext(_localctx);
                enterOuterAlt(_localctx, 2);
                {
                setState(9);
                match(B);
                }
                break;
            default:
                throw new NoViableAltException(this);
            }
        }
0reactions
lionelplessiscommented, Feb 13, 2017

Coming back to this one really late but still wanted to they thanks @parrt and for the 4.6 release as well 😃. I tested it for C# and it didn’t work so I investigated a bit and saw that the fix was not ported to C#. Not sure how the Antlr community keeps the different target languages in sync. For this one I submitted a PR.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Writing a Parser — Part III: Syntax Error Handling - Medium
In this article, I will discuss how to handle syntax errors in a parser, and recover from them to produce a resilient parse...
Read more >
Fault Tolerance | Chevrotain
Chevrotain uses several fault tolerance / error recovery ... The unexpected token X will be skipped (deleted) and the parsing will continue.
Read more >
Seq2Parse: Neurosymbolic Parse Error Repair - Philip Guo
Seq2Parse is based on the insight that Symbolic Error Correcting (EC) Parsers can, in principle, synthesize repairs, but, in practice, are overwhelmed by...
Read more >
DefaultErrorStrategy (ANTLR 4 Runtime 4.11.1 API)
This method implements the single-token deletion inline error recovery strategy. ... In error recovery mode, Parser.consume() adds symbols to the parse tree ......
Read more >
ANTLR4: getFirstChildWithType with a ParseTree
final ParseTree classNameElement = (ParseTree) ((GrammarAST) node).getFirstChildWithType(MyParser.IDENTIFIER);. where node is the argument of ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found