TextMate lacks means to match a pattern exactly one time
See original GitHub issueTextMate seems to lack a means to indicate that a pattern is to be matched only one time, similar to the REGEX ? quantifier. Without this, it makes some languages difficult to correctly scope.
Take for instance PowerShell, which doesn’t really have any reserved keywords. PowerShell does support if, elseif and else keywords for flow control, and like many languages, else is only supported once after an if statement. However, because keywords are not reserved, elseif and else can be reused as command names, and the only factor that controls that, is context. While in the context of an if statement, elseif and else serve as keywords, with else or anything that is not elseif terminating the if context. Additionally, () condition group and {} statement blocks are required for if and elseif and {} is required for else (but that is not that important since the if context is already terminated effectively).
# general syntax (not including comments and line breaks)
if (condition) {statement} elseif (condition-elseif) {statement-elseif} else {statement-else}
# demonstration
# `else another-command` is actually another command (native executable or user defined function)
if(1){do-something}else{do-something else}else another-command
To further complicate things, the state at which this statement is reached is unknown, as there are multiple ways to arrive at it. This prevents having a forceful exit strategy where by the grammar can force a specific rule to progress until it is clear to back out of the current stack.
Example
# hashtable using `if` in assignment
@{
key = if (cond) {statement}
key2 = if (cond) {statement} else {statement else}
else = 'const' # else here is valid hashtable literal key name, because `if` context ended above.
key3 = if (condition) {statement}
else {another statement} # `if ` context was not closed in previous line so it continued to this line
}
(Note the grammar that GitHub uses only differentiates the else hash key above based on the presence of an =, not on the context of the if statement.)
I cannot find a means to formulate a TextMate grammer than can properly describe this situation. I think this is due to the lack of a property on each begin or match rule such as applyPatternOnce, which would limit the matching of the pattern to only once in the current stack scope.
elseshould only be allowed once perifcontext.()condition should be required but only once periforelseifsubcontext.{}statement block should be required, but only once perif,elseiforelsesubcontext, but only after the()condition forifandelseif.
Grammar constructed so far: (only partial file) (if testing, only use empty conditions and empty statement blocks and no comments, as I am not including those subsections as they are not relevant to this issue.)
{
"patterns": [
{
"comment": "else,elseif: only after if,elseif",
"begin": "(?i)(?=if[\\s{(,;&|)}])",
"end": "(?!\\G)",
"patterns": [
{
"include": "#ifStatement"
}
]
}
],
"repository": {
"ifStatement": {
"comment": "else,elseif: only after if,elseif",
"begin": "\\G(?i:(if)|(elseif))(?=[\\s{(,;&|)}])",
"beginCaptures": {
"1": {
"name": "keyword.control.if.powershell"
},
"2": {
"name": "keyword.control.if-elseif.powershell"
}
},
"end": "(?=.|$)",
"applyEndPatternLast": true,
"patterns": [
{
"include": "#advanceToToken"
},
{
"begin": "(?<![)}])(?=\\()",
"end": "(?=.|$)",
"applyEndPatternLast": true,
"patterns": [
{
"begin": "\\G\\(",
"beginCaptures": {
"0": {
"name": "punctuation.section.group.begin.powershell"
}
},
"end": "\\)",
"endCaptures": {
"0": {
"name": "punctuation.section.group.end.powershell"
}
},
"name": "meta.if-condition.powershell",
"patterns": [
{
"comment": "`;` not permitted here",
"match": ";",
"name": "invalid.source.powershell"
},
{
"include": "#command_mode"
}
]
},
{
"begin": "(?<=\\))(?=[\\s#]|<#|`\\s|{)",
"end": "(?=.|$)",
"applyEndPatternLast": true,
"patterns": [
{
"include": "#advanceToToken"
},
{
"begin": "(?<!})(?={)",
"end": "(?=.|$)",
"applyEndPatternLast": true,
"patterns": [
{
"begin": "\\G\\{",
"beginCaptures": {
"0": {
"name": "punctuation.section.braces.begin.powershell"
}
},
"end": "}",
"endCaptures": {
"0": {
"name": "punctuation.section.braces.end.powershell"
}
},
"name": "meta.statements.if-condition.powershell",
"patterns": [
{
"include": "$self"
}
]
},
{
"begin": "(?<=})(?=[\\s#]|<#|`\\s)",
"end": "(?=.|$)",
"applyEndPatternLast": true,
"patterns": [
{
"include": "#advanceToToken"
}
]
}
]
}
]
}
]
},
{
"begin": "(?i:else)(?=[\\s{(,;&|)}])",
"beginCaptures": {
"0": {
"name": "keyword.control.if-else.powershell"
}
},
"end": "(?=.|$)",
"applyEndPatternLast": true,
"patterns": [
{
"include": "#advanceToToken"
},
{
"begin": "(?<!}){",
"beginCaptures": {
"0": {
"name": "punctuation.section.braces.begin.powershell"
}
},
"end": "}",
"endCaptures": {
"0": {
"name": "punctuation.section.braces.end.powershell"
}
},
"name": "meta.statements.if-else-condition.powershell",
"patterns": [
{
"include": "$self"
}
]
}
]
},
{
"begin": "(?i)(?=elseif[\\s{(,;&|)}])",
"end": "(?!\\G)",
"patterns": [
{
"include": "#ifStatement"
}
]
}
]
},
"advanceToToken": {
"comment": "consume spaces and comments and line ends until the next token appears",
"begin": "\\G(?=[\\s#]|<#|`\\s)",
"end": "(?!\\s)(?!$)",
"applyEndPatternLast": true,
"patterns": [
{
"comment": "useless escape, and doesn't count as a token",
"match": "`\\s",
"name": "invalid.character.escape.powershell"
},
{
"include": "#commentLine"
},
{
"include": "#commentBlock"
}
]
}
}
}
For reference to this grammar (the complete grammar): https://github.com/msftrncs/PowerShell.tmLanguage/blob/argumentmode_2ndtry/powershell.tmLanguage.json
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:5 (1 by maintainers)

Top Related StackOverflow Question
Nevermind, I’ve found the related issue, which leads to more info on that matter and it looks promising, both the API itself and how it works in the recent feature preview: https://github.com/microsoft/vscode/issues/86415
It’s a good time to be a grammar extension author, I suppose.
This is very well written, and describes a hard limitation of TextMate grammars. This limitation appears to be solvable via some new TM construct, but IMHO there is an entire class of cases that TM cannot handle, simply because it is a top-down single-pass parser.
There are cases where there is some bit of information lower in the file that ends up influencing a coloring decision done at the beginning of the file, context-sensitive keywords are a great example of that.
We do not have any plans to expand or diverge from the TextMate grammar implementation, and we try as much as possible to align to TextMate… We plan to solve context sensitive coloring via special purpose semantic coloring API in VS Code.