Add possibility to jump in a state and push new state on stack
See original GitHub issueIn the current implementation, you can only have pop: true
, push: "someState"
or next: "someOtherState"
in a stateful lexer.
Imagine you are in state = "currentState"
but could set the state to next: "continueHereState"
and at the same time push: "parseSomething"
. The next time you pop
, it would result in "continueHereState"
instead of going back to "currentState"
.
For me, this was useful to parse for example function calls in JavaScript with recursive arrays and objects. Something like:
identifier(["some", "array", {}, 123], {"object": {"values": ["a", "b"]}, "whatever": false})
I’ve just tweaked these lines https://github.com/no-context/moo/blob/24b23ca961232df15f870f9c8db1c933f2a31e21/moo.js#L484-L486 to this:
if (group.pop) {
this.popState()
} else if (group.push && group.next) {
this.setState(group.next)
this.pushState(group.push)
} else if (group.push) {
this.pushState(group.push)
} else if (group.next) {
this.setState(group.next)
}
Is this something you might want a PR for? Would it make sense to allow next
inside a pop
as well (resulting in setState
directly after the pop)?
Issue Analytics
- State:
- Created 5 years ago
- Comments:8
This is a fairly common problem people have when writing lexers and parsers. You generally want your lexer to be as dumb and permissive as possible, i.e., it should know nothing about syntax except what the tokens are and the absolute minimum necessary to distinguish among them (in your example, the ability to distinguish between regular text and JavaScript code). I’d recommend writing your lexer like this:
That gives you a token stream like this:
The reason your lexer should be permissive and un-clever is twofold:
{
must be matched by a}
and contain key-value pairs), so there’s no reason your lexer needs to know that too, and you can save yourself some time and maintenance effort by not writing the language syntax out twice. Also, parsers are designed to encode structural information (whereas lexers are designed to encode character-based information), so you’ll find it much easier to describe the structural features of the language in a parser (e.g.,lbrace (string colon value (comma string colon value)*)? rbrace
instead of every state that starts withobject
in your example).,
or)
after argument, but got number” instead of “unexpected4
”).That’s right: I don’t think we want to add a feature like this at this time. Moo is intended to be used with a parser of some kind, and we’re not planning to add parser-like features to it. (If someone was using Moo with a parser, and could demonstrate a use case that required this, then we’d think about it again.)
Good luck with your project! 😊