Non-RegExp patterns
See original GitHub issueMotivation
Regexes are easy to use and powerful enough for most of the highlighting problems we face. However, sometimes regexes are not powerful enough.
Since some tokens can only be matched by a context-free grammar, we usually resorted to a number of tricks ranging from supporting a finite number of recursion steps to using the usually surrounding of the token. We use these tricks out of necessity but what we really need in those cases is some more powerful than regexes.
Description
I want to propose that we relax grammars to allow regex-like objects. The basic idea is that Prism’s matching algorithm only uses the exec
and lastIndex
properties of RegExp objects, so any object that implements those properties can be used.
Speaking in types, I want to change:
interface GrammarToken {
pattern: RegExp;
...
}
interface Grammar {
[name: string]: RegExp | GrammarToken | Array<RegExp | GrammarToken>;
}
to:
interface RegExpLike { // RegExp trivially implements this interface
lastIndex: number;
exec(value: string): RegExpExecArray | null;
}
interface GrammarToken {
pattern: RegExpLike;
...
}
interface Grammar {
[name: string]: RegExpLike | GrammarToken | Array<RegExpLike | GrammarToken>;
}
This will allow us to implement custom matchers that can be more powerful than regexes.
Required changes to Prism Core
~The only thing we have to change is how we try to enable the global
flag here.~
Edit: This idea require no changes to Core, if we require a global: true
property in the RegExpLike
interface.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:7 (7 by maintainers)
@RunDevelopment Duh, yes, I knew I had an issue about that. Let me look into doing that soon. Maybe we consider doing a migration from TravisCI -> GH Actions in the process.
Yeah. I usually value correctness more than bundle size, so please do hold me back if I went overboard.
That being said, when I propose (or implement) these ideas that are motivated by one or a few specific cases (e.g. #2190, #2002), I usually think in terms of how well the idea will scale. I.e. if it took 1kb to implement a logic that saves 100bytes every time it’s used, then it might be worth it.
I also want to say that most of these ideas are motived from cutting down complexity. This one is too. This might seem contradictory because implementing these matchers as described above is no easy feat. My end game with this issue to implement something like an LL parser. We would supply a CF grammar and the matcher will be generated for us. Easy to use and as declarative as regexes. Implementing this won’t be easy but we only have to it once. I also intend this to replace constructs like this I created out of necessity.
#1787