Construct a Lark grammar from ABNF format (RFC 5234)
See original GitHub issueRFC 5234 describes the standard grammar format for internet standards, such as the notoriously-hard-to-validate email addresses.
Because this standard is a dialect of EBNF and does not allow for embedded code, it should be relatively easy to construct a Lark object for a given ABNF grammar - at least easier than converting from Nearley! Hopefully it’s easy enough that runtime conversion in a new Lark.from_abnf
method (or group of methods) would be practical.
This feature request is based on HypothesisWorks/hypothesis#170, where I eventually realized that parsing ABNF was going to be easier as well as more widely useful upstream. I’d be happy to work on this with some guidance about where to start, and have already translated the grammar of ABNF from ABNF to Lark’s format.
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (5 by maintainers)
Top GitHub Comments
I think it’s a nice idea. I think your best bet is to create a Lark parser that reproduces the output of the GrammarLoader parser : https://github.com/lark-parser/lark/blob/master/lark/load_grammar.py#L663
You might have to do some post-processing (for example, in a Transformer) to make them a perfect fit.
Once you have that working, I’ll add an interface to plug it in there when called with ABNF grammar.
While using
Lark.from_abnf
isn’t bad (and can be complemented withLark.open_abnf
), how about doing this instead?That opens the door to adding other formats in the future.
In that situation, it’s common to add new language features that don’t break the old one. For example a new operator, that works in lark and not ABNF, but you don’t have to use it.
That seems a bit cumbersome. Why not just a have an translator from Extended-ABNF into ABNF? It should be fairly easy, just removing and canonizing some nodes, and then writing it back. It’s simple enough that Lark’s reconstructor might even be able to handle it.