Static code analysis support
See original GitHub issueProblem description People can write valid JSLT code that contains incorrect regular expressions. The code parses correctly but fails at runtime. The failure can be deterministic (regexp as a standalone expression) or indeterministic (regepx is part of an OR boolean clause that is triggered only under certain conditions).
The Java Pattern compile set of methods will throw a PatternSyntaxException when the expression is compiled.
The static helper method getRegexp
will be called in BuiltinFunctions at runtime, which means people may write JSLT that compiles and they expect to be valid but fails under some circumstances (or always if on a standalone expression).
Expected behaviour
Incorrect regexp patterns in functions like test
fail at JSLT parsing time, surfacing the issue raised by the underlying regexp runtime, as that is independent from the input and can potentially be known at compile time. This behaviour is expected even when regexp expressions are nested within boolean expressions.
Actual behaviour Parsing does not fail, running the JSLT fails at runtime when the right boolean expression is triggered or all the time when the expression is standalone.
Workarounds
Partial: Expression
public interface returned by the public API Parser
allows you to call apply
. You could build a tester that parses any JSLT input code and runs it on an static JSON input, say an empty JSON doc. This would deterministically catch invalid regexes that are standalone and not part of a nested boolean.
Complete: To catch nested regexes in nested booleans, there would need to be a test document in sync with the JSLT code that triggers all the necessary boolean expressions and checks the regexp. This is equivalent to writing comprehensive unit tests for all JSLT code.
Complete: use static analysis techniques, obtain the JSLT Abstract Syntax Tree (AST), do a search for test
and other regexp using expression nodes, check the regexp correctness (check them against an empty JSON doc). This is a deterministic method that catches all invalid regexes.
Expression
public interface returned by the public API Parser
does not expose useful methods to implement this, like getting children expressions and the name of the functions. It can be fairly trivially implemented by using internal interfaces ExpressionImpl
, which works but is very brittle as is not using public contracts.
Complete: extending Expression
to return children expression and the name/type of the expression. Exposing the name is not mandatory but means the testing can be done only on relevant nodes and not have to brute force test all JSLT code expressions to look for PatternSyntaxException
s which is computationally expensive. Once that interface is exposed, it is fairly trivial to implement static code analysis like testing regexes, and may be useful for other use-cases.
Issue Analytics
- State:
- Created 2 years ago
- Comments:8
Thank you very much, we will test it but looking at the PR and tests I would not expect any problems 😄
thanks 🙏
I’ll add a task to our backlog to do so.