question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Use a proper parser for link detection

See original GitHub issue

There will always be issues with URL matching using regex as some cases simply can’t be caught with regex. An example where regex will fail is including brackets in URLs but only when the brackets are opened within the url:

  • Include brackets: http://<domain>.com/foo(bar)
  • Don’t include wrapping brackets: (http://<domain>.com/foobar)
  • Include commas: http://<domain>.com/foo,bar
  • Don’t include trailing commas: http://<domain>.com/foo, other text
  • Detect spaces (paths are ambiguous?) c:\Users\X\My Documents

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:11
  • Comments:21 (18 by maintainers)

github_iconTop GitHub Comments

6reactions
Tyriarcommented, Oct 23, 2019

Current state

The way links work right now is that after the viewport has stopped scrolling for 200ms, the links are computed for the current viewport. Embedders can provide a validation callback which allows the embedder to validate the link sometime after the links are computed, note that links are only shown once they are validated. The end result is pretty nice, we can present underlines for all links even when you need a modifier to execute the link, it does however have some fundamental problems:

  • Even though we’re just processing when the viewport stops, that’s still a lot of work that would go unused.
  • Since the entire viewport is processed, it’s impractical/slow to create links for many words, for example files without extensions (https://github.com/microsoft/vscode/issues/22772).
  • Doing this much validation on any text that looks like kind of like a file is way too expensive on some environments, as any sequence of characters could be a file. This is particular bad on remote file systems (https://github.com/microsoft/vscode/issues/79336).
  • The current system is complex; I think there are still race conditions in the linkifier and links disappear if you hide and show the terminal (https://github.com/microsoft/vscode/issues/36072, closed but it’s still an issue).
  • It’s all regex-based and difficult to maintain and extend (at least for myself).
class Terminal {
	registerLinkMatcher(regex: RegExp, handler: (event: MouseEvent, uri: string) => void, options?: ILinkMatcherOptions): number;
	deregisterLinkMatcher(matcherId: number): void;
}
interface ILinkMatcherOptions {
	matchIndex?: number;
	validationCallback?: (uri: string, callback: (isValid: boolean) => void) => void;
	tooltipCallback?: (event: MouseEvent, uri: string, location: IViewportRange) => boolean | void;
	leaveCallback?: () => void;
	priority?: number;
	willLinkActivate?: (event: MouseEvent, uri: string) => boolean;
}

Proposal

The long standing hope was to move to a “parser-based” link system (https://github.com/xtermjs/xterm.js/issues/583) but it was never really clear how an addon would provide a parser exactly. Here’s my very VS Code API-inspired proposal:

class Terminal {
	registerLinkProvider(linkProvider: ILinkProvider): IDisposable;
}

interface ILinkProvider {
	provideLink(position: IBufferCellPosition, callback: (link: ILink | undefined) => void): void;
}

interface ILink {
	range: IBufferRange;
	showTooltip(event: MouseEvent, link: string): void;
	hideTooltip(event: MouseEvent, link: string): void;
	handle(event: MouseEvent, link: string): void;
}

interface IBufferRange {
	start: IBufferCellPosition;
	end: IBufferCellPosition;
}

interface IBufferCellPosition {
	x: number;
	y: number;
}

The basic idea is that instead of evaluating links whenever scrolling has stopped, only evaluate links for the current cursor position when a hover occurs. While file access is slow en masse, single requests just for mouse movement should be reasonable. Some other things of interest:

  • The addon can still use the regex method and translate the results an IBufferRange or use a parser to check the whole line, or just expand outwards from the cursor.
  • It uses the buffer API to access the line, that’s pretty cool 😎.
  • The link could be cached until a scroll happens for that entire range, meaning no need to recompute just for moving the mouse over the same cells.
  • It allows embedders to use their own link detection mechanism, VS Code has a shared implementation but we’ve been unable to use it in the terminal (https://github.com/microsoft/vscode/issues/83191).

Going this route would seemingly fix many of the problems VS Code has with links, namely:

  • No more perf issues validating on slow file systems.
  • We can linkify paths without separators since validation will only be triggered under the cursor.
  • By expanding left then right from the cursor we can fix many of the issues with the current link detection (https://github.com/microsoft/vscode/issues/21125), including spaces in Windows paths (this is still a tricky problem but seems more achievable).

The only downside for this is a slight delay in the link appearing as the computing and validation occurs at hover time.

Open questions

  • Do we need an ILinkMatcherOptions.priority equivalent?
  • This would be an ideal time to introduce Promise into the API, working with promises is lovely but we will always have callbacks in the parser for performance reasons. We could just stick to the callback everywhere for consistency?
  • We could use markers for the range instead of numbers, that would allow the link to be cached after a scroll occurs. It probably isn’t worth caching like this imo but it does bring up another interesting question in the https://github.com/xtermjs/xterm.js/issues/2480 discussion.
1reaction
Tyriarcommented, Oct 25, 2019

Not sure if I’ll have time to get to this in the next couple of months so this is open to PRs if someone want to have a go at implementing a proof of concept. I think we’ll want to support both links types until the next major version at which point registerLinkMatcher will get removed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

A Guide To Parsing: Algorithms And Terminology
An in-depth coverage of parsing terminology an issues, together with an explanation for each one of the major algorithms and when to use...
Read more >
Detect URLs in text with JavaScript - regex - Stack Overflow
First you need a good regex that matches urls. This is hard to do. See here, here and here: ...almost anything is a...
Read more >
Parse a sentence
The parser expects just one sentence. It will try to analyze what you put into the box as a single sentence. We recommend...
Read more >
Practical parsing with Flex and Bison - begriffs.com
We just need to pair it with a scanner that reads atoms and parens. Finally, here's how to call the parser from a...
Read more >
Parsing sentences with the OTHER natural language tool
Jeff ElmoreMany of you are probably familiar with NLTK, the wonderful Natural Language Toolkit for Python. You may not be familiar with ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found