question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

URL match patterns

See original GitHub issue

Implement user-friendly URL glob patterns with negation ability and some special behavior. Similar to globby, extension match patterns etc.

How it should work:

  • * matches everything
  • */*.pdf matches PDF file extension
  • google.com matches www.google.com, mail.google.com, google.com/search etc.
  • google.* matches google.com, google.by etc.
  • *.google.com matches mail.google.com, inbox.google.com etc.
  • google.com/mail/* matches google.com/mail/inbox etc.
  • google.com/*.pdf matches google.com PDF files.
  • localhost:*should match localhost and any port
  • ftp://* should match FTP protocol only
  • /^.google\.com\/mail/ should behave as a regular expression` (but should not be implemented yet, maybe there is no need for it).

* can only be surrounded by dots and first / in host part and by last / and file extension dot at path part. * corresponds to one host or one path part or to many parts if placed at start or end. Queries should not be allowed.

  • *.google.com OK
  • goo*.com error
  • google.com/blog/* OK
  • google.com/blog* error
  • google.com/*/blog OK
  • google.com/*.pdf OK
  • google.com/2018-07-*.jpg error
  • google.com/search?q=cat&p=dog error, we won’t investigate if it should match search?p=dog&p=cat, use regular expressions for that

URL lists

URL lists are used in user’s Site List config and fixes configurations (each record can have multiple URLs). ! should reverse pattern result.

  • google.com, !*.google.com should match google.com except it’s subdomains.
  • google.com, !mail.google.com should match everything that matches google.com except everything that matches mail.google.com
  • google.com, !mail.google.com, mail.google.com/compose should not match everything that matches mail.google.com except everything that matches mail.google.com/compose

Pattern specificity

Pattern specificity is used to determine which exact match from config file to use. It should behave similar to CSS specificity.

  • * 0.1.0.0
  • google.* 1.1.0.0
  • google.com 2.0.0.0
  • *.google.com 2.1.0.0
  • mail.google.com 3.0.0.0
  • mail.google.com/* 3.0.0.1
  • mail.google.com/mail 3.0.1.0
  • mail.google.com/mail/* 3.0.1.1
  • mail.google.com/mail/compose 3.0.2.0

Result is an array of

  1. Host parts exact matches (plus protocol and port).
  2. Host * matches (plus * protocol and port matches).
  3. Path parts exact matches (plus file extension).
  4. Path * matches (plus * file extension).

URL list sorting

It is needed to validate alphabetical order in configuration files so that they stay maintainable. Configuration records are compared by first URL. Prefix ‘*.’ should be skipped, resulting repeated records should be sorted by specificity. Regular expressions should not be used in comparison.

bing.com
google.*
google.com
*.google.com
google.com/*
google.com/mail
google.com/search
wikipedia.org

UI behavior

  • Clicking “Toggle” button should add negation pattern if URL matches some other pattern in list.
  • There should be ability for user to pick multiple possible patterns for toggle, e.g. google.com, google.com/maps.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:81
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

23reactions
Fred-Vatincommented, Jun 12, 2018

Eager to be able to exclude sub-domains 👍

5reactions
Gustedcommented, Nov 1, 2020

I believe that this is fundamentally incorrect with how the internet works. No other part of the internet’s infrastructure that I’m aware of treats something.google.com the same as google.com. If my site list has an exception for google.com, then only google.com should be considered as part of the list. Including all subdomains is incorrect behavior, if I want them included then I would use *.google.com to match both google.com and all of its subdomains (which is the behavior specified on the extension match patterns page linked in the issue).

As this opinion is shared with a lot of other user(which is now including me) I’ve implemented this behavior in the PR https://github.com/darkreader/darkreader/pull/2517#issuecomment-720103367

Read more comments on GitHub >

github_iconTop Results From Across the Web

Chrome Extensions Match patterns - Chrome Developers
Match patterns ; https://*/foo*, Matches any URL that uses the https scheme, on any host, as long as the path starts with /foo,...
Read more >
Match patterns in extension manifests - Mozilla - MDN Web Docs
Match patterns are a way to specify groups of URLs: a match pattern matches a specific set of URLs. They are used in...
Read more >
7.4 - Constructing URL Patterns - Google
A URL pattern is a set of ordered characters to which the Google Search Appliance matches actual URLs that the crawler discovers. You...
Read more >
Defining match patterns for an extension to access file URLs
A match pattern is essentially a URL that begins with a permitted scheme ( http , https , file , or ftp ,...
Read more >
URL pattern matching - Userflow
Userflow supports an easy-to-use, yet powerful URL pattern for matching pages in your app. URL pattern matching is useful when auto-starting flows, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found