Suggestion for new, less brittle, analysis system.
See original GitHub issueA little while ago, I posted an issue about supporting macros, and the response made sense. Based on the goals of chevrotain, you weren’t concerned with this idea because other options were available.
But, the discussion got me thinking about the current func.toString()
analysis system, and whether it could be improved on. This was just an idea I had based on a cursory look over the code, and I won’t be surprised if I’ve missed something important that would make this incompatible with chevrotain.
It seems to me that all the brittle magic of chevrotain happens in Parser.performSelfAnalysis(this)
, where the GAST is produced based on the string representations of the rule functions, which is a means of understanding what the user is trying to do with the parser so that analysis can be performed. I’ve come up with another way to shine a light into the user’s intentions without a function.toString()
.
This code is ready to be run (I’m using node v9.8.0
). Note that it was only meant as a proof of concept, and intentionally doesn’t do a lot of things.
Essentially what’s happening here is that the static analyze
function is simply replacing all the low level parsing methods (“monkey patching” them) with ones that build up an analysis of what they’re doing instead of actually consuming tokens. There’s a scope
variable that all these replacement functions have a closure to, that each of them uses to “catch” all invocations that happen beneath them. All of these methods are able to know what parsing methods were called beneath them, no matter how deep in the call tree they are.
// a convenience logging function that's more clear
const util = require('util')
function log(obj) {
console.log(util.inspect(obj, { depth: null }))
}
// this will be returned by all the "monkey patched" low level parsing methods
// it's a dummy value that will stand in for real parsing results
const INSPECT = Symbol()
// the variable that will be used to "catch" child parsing calls
let scope = null
// since we're faking real parsing results during the analysis phase,
// and since the user might be performing embedded actions
// we need to give them an easy way to act on something that could be fake
function actOnPossibleInspect(possibleInspect, actionFunction) {
if (possibleInspect !== INSPECT) return actionFunction(possibleInspect)
else return INSPECT
}
// this system could be replaced with just a enclosed boolean
// that indicates whether we are in inspection mode or not
// a token matching function that uses the action system
function matchToken(token, testToken) {
return actOnPossibleInspect(token, (tok) => tok.tokenType == testToken.tokenType)
}
// all the monkey patch functions
function monkeyPatchLook(amount) {
scope.push(`look:${amount}`)
return INSPECT
}
function monkeyPatchConsume(tokenType) {
scope.push(`consume:${tokenType}`)
return INSPECT
}
// subrule doesn't invoke the subrule, because it might not be analyzed or fulfilled yet
// whatever chevrotain does to resolve the
// inherent recursiveness of rule calls would happen here
function monkeyPatchSubrule(ruleName) {
scope.push(`subrule:${ruleName}`)
return INSPECT
}
// I haven't included OR, OPTION, AT_LEAST_ONE, etc.,
// because they would be very similar to this and redundant
function monkeyPatchMany(options) {
let gate, def
if (typeof options == 'function') {
gate = () => {}
def = options
}
else ({ gate, def } = options)
const oldScope = scope
scope = []
gate()
const gateScope = scope
scope = []
def()
const defScope = scope
scope = oldScope
scope.push({ type: 'many', defScope, gateScope })
return INSPECT
}
class Parser {
constructor() {
this._rules = {}
this._dumbGast = {}
}
rule(ruleName, ruleFunction) {
this._rules[ruleName] = ruleFunction
this[ruleName] = ruleName
}
look(amount) {
throw new Error("This isn't real, not needed for this proof of concept.")
}
consume(tokenType) {
throw new Error("This isn't real, not needed for this proof of concept.")
}
subrule() {
throw new Error("This isn't real, not needed for this proof of concept.")
}
many() {
throw new Error("This isn't real, not needed for this proof of concept.")
}
static analyze(parserInstance) {
const realLook = parserInstance.look
const realConsume = parserInstance.consume
const realSubrule = parserInstance.subrule
const realMany = parserInstance.many
// this method swaps out the real methods with the monkey patches
parserInstance.look = monkeyPatchLook
parserInstance.consume = monkeyPatchConsume
parserInstance.subrule = monkeyPatchSubrule
parserInstance.many = monkeyPatchMany
for (const [ruleName, rule] of Object.entries(parserInstance._rules)) {
// it sets up the scope for the first time
// all the child invocations will end up here
scope = parserInstance._dumbGast[ruleName] = []
// it calls all the rules, which will use the monkey patched methods
rule()
// then it sets the scope back
scope = null
}
// here they are!
// this isn't a real or useful data structure,
// but it demonstrates that you can build up something from
// the invocations down the call stack
log(parserInstance._dumbGast)
// then we put all the real methods back
parserInstance.look = realLook
parserInstance.consume = realConsume
parserInstance.subrule = realSubrule
parserInstance.many = realMany
}
}
// here's a parser using this
// the actual grammar here is complete nonsense
// but again a proof of concept
class ConceptParser extends Parser {
constructor() {
super()
// a shorter name for this function
const act = actOnPossibleInspect
// since we aren't doing function string analysis anymore,
// we can just use basic functions to call parser methods
// this acts as a macro that does the same thing with different arguments
const macroAlternating = (oneArg, otherArg) => {
const a = this.consume('a')
this.subrule(this.manyC)
const oneAlternation = this.many(() => {
const one = this.consume(oneArg)
const other = this.consume(otherArg)
return [one, other]
})
this.subrule(this.manyC)
const otherAlternation = this.many(() => {
const other = this.consume(otherArg)
const one = this.consume(oneArg)
return [other, one]
})
this.subrule(this.manyD)
const b = this.consume('b')
// we have to use the act function,
// since the analysis phase will produce fake INSPECT's
// that don't have methods like .map
return act(a, () => {
return {
a, b,
c: otherAlternation,
d: oneAlternation.map(([ind, dep]) => { ind, dep })
}
})
}
this.rule('topLevel', () => {
// here we are using a plain function that calls parser methods
const alternating = macroAlternating('e', 'f')
const cs = this.subrule(this.manyC)
const ds = this.subrule(this.manyD)
return act(cs, () => { alternating, cs, ds })
})
this.rule('manyC', () => this.many({
gate: () => matchToken(this.look(1), 'c'),
def: () => {
return act(this.consume('c'), tok => tok.tokenValue)
}
}))
this.rule('manyD', () => this.many(() => {
return act(this.consume('d'), tok => tok.tokenValue)
}))
Parser.analyze(this)
}
}
new ConceptParser()
And the output (on my machine)
{ topLevel:
[ 'consume:a',
'subrule:manyC',
{ type: 'many',
defScope: [ 'consume:e', 'consume:f' ],
gateScope: [] },
'subrule:manyC',
{ type: 'many',
defScope: [ 'consume:f', 'consume:e' ],
gateScope: [] },
'subrule:manyD',
'consume:b',
'subrule:manyC',
'subrule:manyD' ],
manyC:
[ { type: 'many',
defScope: [ 'consume:c' ],
gateScope: [ 'look:1' ] } ],
manyD: [ { type: 'many', defScope: [ 'consume:d' ], gateScope: [] } ] }
The possible objection I can see to this method?
It’s gross
I can see the objection that swapping out methods, using an enclosed pointer variable to catch data, and producing fake output that the user could see is sort of hacky. But is it considerably worse than analyzing the string representations of functions? Especially given the fact that it opens the door to the massive convenience of using plain functions that call parser methods? I think the trade off is well worth it.
The biggest inconvenience introduced is in making embedded actions more cluttered with the introduction of INSPECT
results, but again I think it’s a small price to pay.
Would this work well as a separate api?
If the chevrotain maintainers aren’t interested in pursuing this direction, I could imagine taking this on as a new project using the chevrotain engine (or at least the underlying “auto lookahead” algorithm).
Thoughts?
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (4 by maintainers)
Top GitHub Comments
The macros would be expanded into an equivalent GAST representation yes. And it only appears I wasn’t using MANY/OR/etc simply because I was quickly typing up a proof of concept and named them differently.
I’m going to go down the path of seeing how this would work, and reopen this issue if I need any help.
Hello @blainehansen
A similar approach was once again suggested in #992 and I’ve started implementing it as part of a major version change in #998, but with hopefully limiting the number of breaking changes somewhat…
I guess I have suffered enough from the brittleness of Function.toString and its time to move-on 😄 . This will also enable implementing new capabilities/features (e.g macros).
Cheers.