Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Discuss: Sequences

See original GitHub issue

Is your request related to a specific problem you’re having?

I don’t have time to go back and cite all the examples, but see a lot of existing grammars and see our lengthily discussions on sequencing in the LaTex thread. The problem is when want to match a specific sequence of modes (which all may be complex in and of themselves). A made up BNF like example:

<list>           ::= *special <term> <term> <opt-whitespace> <list>

But now imagine that instead of just <tag> that these might be more HTML like constructs, with attributes, strings, special characters, etc… so 4 modes start to appear:

key tag (<list ...>)
assignment (::=)
modifiers (*special)
then multiple rule tags

The solution you’d prefer / feature you’d like to see added…

I proposing adding a sequence to sit beside contains. While contains is orderless, sequence would be sequential. I’m simplifying the rules below so you can see the bigger picture, but each rule could possibly have it’s own begin, end, contains, etc… For the first pass I might restrict some things like starts, endsParent, endsWithParent just to shrink the problem space… and then find out later if those things are truly needed… but otherwise these would be full modes in their own right and have the same output/processing behavior as other modes when they are active.

// rule
{
    begin: /<.*>/,
    sequence: [
        { match: /<.*>/ },
        { match: /::=/ },
        { match: /\*special/, optional: true },
        { match: /<.*>/, multiple: true }
    ],
    end: mode.MATCH_NOTHING_RE,
    illegal: /\S/,
}

begin could be optional and if so it could borrow the begin from the first match in the sequence. The first match would obviously then need to be mandatory (not optional).
end by default would attempt to “immediate terminate” the sequence (as it does with contains) - though it’s worth discussing if this is the correct default for sequences and how we might change this without being inconsistent
- this behavior could be changed just by setting and end rule or using MATCH_NOTHING_RE if there is truly no end rule
if one wants to mandate a full sequence illegal could be used (such as using illegal above to flag non-spaces). This would cause an illegal error to be thrown for incomplete sequences.

worth discussing if this is the correct default for sequences

It feels like specifying end above is a bit annoying… but if we did not then any space we encountered between the tags we care about would cause the sequence to terminate. This type of scenario (spaces or non-content in between things care about) is common enough that we should try to find a nice way to handle it without needing tons of additional modes for whitespace. Having separate regex for whitespace is already really bugging me with our new multi-match support.

How would it work in practice

Once a mode with sequence was entered the parser would go into a sequential mode loop:

loop
- look for current item in sequence (or end or illegal)
- if illegal found, raise; if end found, end mode
- when item found, start that mode
  - if item singular, increment position in sequence
- when item not found and optional (or multiple and already matched once), increment position
- if no more items, end mode

This is overly simplified of course… since if you had 2 optionals for example the parser would be using a multi-regex that was scanning for the next 3 items in the sequence (plus begin and illegal)… since the 3rd rule would be elgible to match at any position due to the first 2 rules being optionals.

This adds complexity but I’m not sure a (one item after another, no repeats, all mandatory) solves a lot of real problems… I remember LaTex definitely had optionals and such things.

Any alternative solutions you considered…

More sugar on top of our existing starts stuff, but I find it all quite convoluted to think and reason about… and there is very hard to understand behavior with endsParent and starts. Since changing starts would likely break a bunch of existing grammars I think we need some real new behavior rather than just sugar.

Sugar also becomes incredibly hard to debug since the end user sees only the sugar and doesn’t understand the potentially incredibly complex rules being generated behind the scenes. So far I’ve tried to keep our sugar minimal and doing simple things.

There are other ways of writing it syntactically, such as reusing contains but have a flag to say it’s a sequence. I think I like this els off the top of my head though.

contains: [ ... ],
sequential: true,

Just to name one example.

Additional context…

Note that no where are we talking about branching or back-tracking. This is not being discussed. Sequences either complete (after matching every item), terminate early with an incomplete sequence (the end matcher is triggered) or raise an error (they hit an illegal). Illegal is how you would specify the “the full sequence is required”.

Once we find the start of a sequence, we are committed to that sequence. This is not tackling problems like “It might be sequence X or sequence Y” - which would require backtracking. For some grammars these situations might be handler with creative use of optionals and multiples.

Also, in simple cases a strong begin regex with a look-ahead could help making sure the right sequence was selected.

Issue Analytics

State:
Created 2 years ago
Comments:12 (12 by maintainers)

Top GitHub Comments

1reaction

joshgoebelcommented, Apr 19, 2021

but if the value of match is always a regex,

It’s not just regex (they just make the examples easier to read), these can be full modes with 100s of submodes, etc… If it was just regex then someone would use the new multi-match support we just added… this is needed for much more complex rulesets.

sequence: [ PREAMBLE, HEADERS, YAML_FRONT_MATTER, DOCUMENT_BODY ]

0reactions

schtandardcommented, Apr 20, 2021

Can you come up with a more real life example of this same problem say perhaps from LaTex domain…? That might be helpful to aid in the discussion.

All optional arguments in LaTeX work like this, for example

\section[short title]{long title} This is some text [containing brackets].

Here [short title] is optional and in

\section{title} This is some text [containing brackets].

the fact that \section is not directly (excepting most whitespace) followed by [ signals that the optional argument is not present. In particular, matching [containing brackets] as belonging to \section would be very wrong.

certainly that means that 2 was skipped and the sequence continues with 3?

I agree.

But what about?

{
  sequence: [
    { match: /a/ },
    { match: /b/, optional: true },
    { match: /c/, optional: true }
    { match: /d/, optional: true }
    { match: /e/ },
  ],
  end: /f/
}

Does the same hold true? First found wins? So adcbef would match:

* `a`, `d` and `e`, ending on `f`

* the `cb` content is ignored

Yes. At least from my standpoint (i.e. what do I need for LaTeX), that’s exactly what it should mean (the cb would simply be out of place there).

Top Results From Across the Web

Sequence - Wikipedia

In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains...

Definition and Examples of Sequences - Cliffs Notes

A sequence is an ordered list of numbers . The three dots mean to continue forward in the pattern established. Each number in...

Sequences - Definition, Rules, Formula, Examples, Types

A sequence is an ordered list of elements with a specific pattern. For example, 3, 7, 11, 15, ... is a sequence as...

Sequence and Series-Definition, Types, Formulas and ... - Byju's

Learn types of sequences such as Arithmetic, Geometric, Harmonic, Sequences and ... we are going to discuss here the concepts of sequence as...

Sequences - Math is Fun

Math explained in easy language, plus puzzles, games, quizzes, worksheets and a forum. For K-12 kids, teachers and parents.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Discuss: Sequences

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

(Python) Syntax highlighting bug in Python REPL

[Theme Maint] Write up on changes for theme maintainers