question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TextMate lacks means to match a pattern exactly one time

See original GitHub issue

TextMate seems to lack a means to indicate that a pattern is to be matched only one time, similar to the REGEX ? quantifier. Without this, it makes some languages difficult to correctly scope.

Take for instance PowerShell, which doesn’t really have any reserved keywords. PowerShell does support if, elseif and else keywords for flow control, and like many languages, else is only supported once after an if statement. However, because keywords are not reserved, elseif and else can be reused as command names, and the only factor that controls that, is context. While in the context of an if statement, elseif and else serve as keywords, with else or anything that is not elseif terminating the if context. Additionally, () condition group and {} statement blocks are required for if and elseif and {} is required for else (but that is not that important since the if context is already terminated effectively).

# general syntax (not including comments and line breaks)
if (condition) {statement} elseif (condition-elseif) {statement-elseif} else {statement-else}

# demonstration
# `else another-command` is actually another command (native executable or user defined function)
if(1){do-something}else{do-something else}else another-command

To further complicate things, the state at which this statement is reached is unknown, as there are multiple ways to arrive at it. This prevents having a forceful exit strategy where by the grammar can force a specific rule to progress until it is clear to back out of the current stack.

Example

# hashtable using `if` in assignment
@{
    key = if (cond) {statement}
    key2 = if (cond) {statement} else {statement else}
    else = 'const' # else here is valid hashtable literal key name, because `if` context ended above.
    key3 = if (condition) {statement}
    else {another statement} # `if ` context was not closed in previous line so it continued to this line
}

(Note the grammar that GitHub uses only differentiates the else hash key above based on the presence of an =, not on the context of the if statement.)

I cannot find a means to formulate a TextMate grammer than can properly describe this situation. I think this is due to the lack of a property on each begin or match rule such as applyPatternOnce, which would limit the matching of the pattern to only once in the current stack scope.

  • else should only be allowed once per if context.
  • () condition should be required but only once per if or elseif subcontext.
  • {} statement block should be required, but only once per if, elseif or else subcontext, but only after the () condition for if and elseif.

Grammar constructed so far: (only partial file) (if testing, only use empty conditions and empty statement blocks and no comments, as I am not including those subsections as they are not relevant to this issue.)

{
	"patterns": [
		{
			"comment": "else,elseif: only after if,elseif",
			"begin": "(?i)(?=if[\\s{(,;&|)}])",
			"end": "(?!\\G)",
			"patterns": [
				{
					"include": "#ifStatement"
				}
			]
		}
	],
	"repository": {
		"ifStatement": {
			"comment": "else,elseif: only after if,elseif",
			"begin": "\\G(?i:(if)|(elseif))(?=[\\s{(,;&|)}])",
			"beginCaptures": {
				"1": {
					"name": "keyword.control.if.powershell"
				},
				"2": {
					"name": "keyword.control.if-elseif.powershell"
				}
			},
			"end": "(?=.|$)",
			"applyEndPatternLast": true,
			"patterns": [
				{
					"include": "#advanceToToken"
				},
				{
					"begin": "(?<![)}])(?=\\()",
					"end": "(?=.|$)",
					"applyEndPatternLast": true,
					"patterns": [
						{
							"begin": "\\G\\(",
							"beginCaptures": {
								"0": {
									"name": "punctuation.section.group.begin.powershell"
								}
							},
							"end": "\\)",
							"endCaptures": {
								"0": {
									"name": "punctuation.section.group.end.powershell"
								}
							},
							"name": "meta.if-condition.powershell",
							"patterns": [
								{
									"comment": "`;` not permitted here",
									"match": ";",
									"name": "invalid.source.powershell"
								},
								{
									"include": "#command_mode"
								}
							]
						},
						{
							"begin": "(?<=\\))(?=[\\s#]|<#|`\\s|{)",
							"end": "(?=.|$)",
							"applyEndPatternLast": true,
							"patterns": [
								{
									"include": "#advanceToToken"
								},
								{
									"begin": "(?<!})(?={)",
									"end": "(?=.|$)",
									"applyEndPatternLast": true,
									"patterns": [
										{
											"begin": "\\G\\{",
											"beginCaptures": {
												"0": {
													"name": "punctuation.section.braces.begin.powershell"
												}
											},
											"end": "}",
											"endCaptures": {
												"0": {
													"name": "punctuation.section.braces.end.powershell"
												}
											},
											"name": "meta.statements.if-condition.powershell",
											"patterns": [
												{
													"include": "$self"
												}
											]
										},
										{
											"begin": "(?<=})(?=[\\s#]|<#|`\\s)",
											"end": "(?=.|$)",
											"applyEndPatternLast": true,
											"patterns": [
												{
													"include": "#advanceToToken"
												}
											]
										}
									]
								}
							]
						}
					]
				},
				{
					"begin": "(?i:else)(?=[\\s{(,;&|)}])",
					"beginCaptures": {
						"0": {
							"name": "keyword.control.if-else.powershell"
						}
					},
					"end": "(?=.|$)",
					"applyEndPatternLast": true,
					"patterns": [
						{
							"include": "#advanceToToken"
						},
						{
							"begin": "(?<!}){",
							"beginCaptures": {
								"0": {
									"name": "punctuation.section.braces.begin.powershell"
								}
							},
							"end": "}",
							"endCaptures": {
								"0": {
									"name": "punctuation.section.braces.end.powershell"
								}
							},
							"name": "meta.statements.if-else-condition.powershell",
							"patterns": [
								{
									"include": "$self"
								}
							]
						}
					]
				},
				{
					"begin": "(?i)(?=elseif[\\s{(,;&|)}])",
					"end": "(?!\\G)",
					"patterns": [
						{
							"include": "#ifStatement"
						}
					]
				}
			]
		},
		"advanceToToken": {
			"comment": "consume spaces and comments and line ends until the next token appears",
			"begin": "\\G(?=[\\s#]|<#|`\\s)",
			"end": "(?!\\s)(?!$)",
			"applyEndPatternLast": true,
			"patterns": [
				{
					"comment": "useless escape, and doesn't count as a token",
					"match": "`\\s",
					"name": "invalid.character.escape.powershell"
				},
				{
					"include": "#commentLine"
				},
				{
					"include": "#commentBlock"
				}
			]
		}
	}
}

For reference to this grammar (the complete grammar): https://github.com/msftrncs/PowerShell.tmLanguage/blob/argumentmode_2ndtry/powershell.tmLanguage.json

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:2
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
Alphishcommented, Feb 19, 2020

Nevermind, I’ve found the related issue, which leads to more info on that matter and it looks promising, both the API itself and how it works in the recent feature preview: https://github.com/microsoft/vscode/issues/86415

It’s a good time to be a grammar extension author, I suppose.

1reaction
alexdimacommented, Nov 20, 2019

This is very well written, and describes a hard limitation of TextMate grammars. This limitation appears to be solvable via some new TM construct, but IMHO there is an entire class of cases that TM cannot handle, simply because it is a top-down single-pass parser.

There are cases where there is some bit of information lower in the file that ends up influencing a coloring decision done at the beginning of the file, context-sensitive keywords are a great example of that.

We do not have any plans to expand or diverge from the TextMate grammar implementation, and we try as much as possible to align to TextMate… We plan to solve context sensitive coloring via special purpose semantic coloring API in VS Code.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Regular Expressions — TextMate 1.x Manual
A regular expression is a domain specific language for matching text. Naively we could write a small program to match text, but this...
Read more >
Regex to match a specific pattern multiple times within a ...
I have the following problem with a latex textfile that consist of multiple sentences, e.g.. Aaa \cref{fig:1}. Bbb \cref{fig:2} bbb \cref{fig ...
Read more >
Writing a TextMate Grammar: Some Lessons Learned
Matches are performed in the order listed. TextMate considers lines of a document one at a time, looking within each one for matches....
Read more >
TextMate: Power Editing for the Mac - The Swiss Bay
the book, that means you can find the code in the download: ... pattern. You will spend a lot more time playing with...
Read more >
Introducing Iro — An Easier Way To Create Syntax Highlighters
Textmate grammar definition files (.tmlanguage) files define a set of ... That is, it only ever attempts to match text on the current...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found