Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Non-RegExp patterns

See original GitHub issue

Motivation

Regexes are easy to use and powerful enough for most of the highlighting problems we face. However, sometimes regexes are not powerful enough.

Since some tokens can only be matched by a context-free grammar, we usually resorted to a number of tricks ranging from supporting a finite number of recursion steps to using the usually surrounding of the token. We use these tricks out of necessity but what we really need in those cases is some more powerful than regexes.

Description

I want to propose that we relax grammars to allow regex-like objects. The basic idea is that Prism’s matching algorithm only uses the exec and lastIndex properties of RegExp objects, so any object that implements those properties can be used.

Speaking in types, I want to change:

interface GrammarToken {
  pattern: RegExp;
  ...
}
interface Grammar {
  [name: string]: RegExp | GrammarToken | Array<RegExp | GrammarToken>;
}

to:

interface RegExpLike { // RegExp trivially implements this interface
  lastIndex: number;
  exec(value: string): RegExpExecArray | null;
}
interface GrammarToken {
  pattern: RegExpLike;
  ...
}
interface Grammar {
  [name: string]: RegExpLike | GrammarToken | Array<RegExpLike | GrammarToken>;
}

This will allow us to implement custom matchers that can be more powerful than regexes.

Required changes to Prism Core

~The only thing we have to change is how we try to enable the global flag here.~

Edit: This idea require no changes to Core, if we require a global: true property in the RegExpLike interface.

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:7 (7 by maintainers)

Top GitHub Comments

2reactions

mAAdhaTTahcommented, Oct 19, 2020

@RunDevelopment Duh, yes, I knew I had an issue about that. Let me look into doing that soon. Maybe we consider doing a migration from TravisCI -> GH Actions in the process.

2reactions

RunDevelopmentcommented, Oct 18, 2020

the tradeoff between bundle size & correctness.

Yeah. I usually value correctness more than bundle size, so please do hold me back if I went overboard.

That being said, when I propose (or implement) these ideas that are motivated by one or a few specific cases (e.g. #2190, #2002), I usually think in terms of how well the idea will scale. I.e. if it took 1kb to implement a logic that saves 100bytes every time it’s used, then it might be worth it.

I also want to say that most of these ideas are motived from cutting down complexity. This one is too. This might seem contradictory because implementing these matchers as described above is no easy feat. My end game with this issue to implement something like an LL parser. We would supply a CF grammar and the matcher will be generated for us. Easy to use and as declarative as regexes. Implementing this won’t be easy but we only have to it once. I also intend this to replace constructs like this I created out of necessity.

add a GH bot that tells us the bundle size impacts of our changes so we can discuss those tradeoffs with data as they come in.

#1787

Top Results From Across the Web

Match a pattern and String without using regular expressions

Given a string, find out if string follows a given pattern or not without using any regular expressions. Examples:

Regular expression to match a line that doesn't contain a word

With negative lookahead, regular expression can match something not contains specific pattern. This is answered and explained by Bart Kiers.

Regular expressions - JavaScript - MDN Web Docs

Simple patterns are constructed of characters for which you want to find a direct match. For example, the pattern /abc/ matches character ...

Regular Expression (Regex) Tutorial

Regular Expression, or regex or regexp in short, is extremely and amazingly powerful in searching and manipulating text strings, particularly in processing ...

How to search literally without any regex pattern?

If what I want to search is in the variable string , does the following code work? call search('\V' . escape(string, '\')).