URL match patterns
See original GitHub issueImplement user-friendly URL glob patterns with negation ability and some special behavior. Similar to globby, extension match patterns etc.
How it should work:
*
matches everything*/*.pdf
matches PDF file extensiongoogle.com
matches www.google.com, mail.google.com, google.com/search etc.google.*
matches google.com, google.by etc.*.google.com
matches mail.google.com, inbox.google.com etc.google.com/mail/*
matches google.com/mail/inbox etc.google.com/*.pdf
matches google.com PDF files.localhost:*
should match localhost and any portftp://*
should match FTP protocol only/^.google\.com\/mail/
should behave as a regular expression` (but should not be implemented yet, maybe there is no need for it).
*
can only be surrounded by dots and first /
in host part and by last /
and file extension dot at path part. *
corresponds to one host or one path part or to many parts if placed at start or end. Queries should not be allowed.
*.google.com
OKgoo*.com
errorgoogle.com/blog/*
OKgoogle.com/blog*
errorgoogle.com/*/blog
OKgoogle.com/*.pdf
OKgoogle.com/2018-07-*.jpg
errorgoogle.com/search?q=cat&p=dog
error, we won’t investigate if it should match search?p=dog&p=cat, use regular expressions for that
URL lists
URL lists are used in user’s Site List config and fixes configurations (each record can have multiple URLs). !
should reverse pattern result.
google.com, !*.google.com
should match google.com except it’s subdomains.google.com, !mail.google.com
should match everything that matchesgoogle.com
except everything that matchesmail.google.com
google.com, !mail.google.com, mail.google.com/compose
should not match everything that matchesmail.google.com
except everything that matchesmail.google.com/compose
Pattern specificity
Pattern specificity is used to determine which exact match from config file to use. It should behave similar to CSS specificity.
*
0.1.0.0google.*
1.1.0.0google.com
2.0.0.0*.google.com
2.1.0.0mail.google.com
3.0.0.0mail.google.com/*
3.0.0.1mail.google.com/mail
3.0.1.0mail.google.com/mail/*
3.0.1.1mail.google.com/mail/compose
3.0.2.0
Result is an array of
- Host parts exact matches (plus protocol and port).
- Host
*
matches (plus*
protocol and port matches). - Path parts exact matches (plus file extension).
- Path
*
matches (plus*
file extension).
URL list sorting
It is needed to validate alphabetical order in configuration files so that they stay maintainable. Configuration records are compared by first URL. Prefix ‘*.’ should be skipped, resulting repeated records should be sorted by specificity. Regular expressions should not be used in comparison.
bing.com
google.*
google.com
*.google.com
google.com/*
google.com/mail
google.com/search
wikipedia.org
UI behavior
- Clicking “Toggle” button should add negation pattern if URL matches some other pattern in list.
- There should be ability for user to pick multiple possible patterns for toggle, e.g.
google.com
,google.com/maps
.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:81
- Comments:12 (6 by maintainers)
Top GitHub Comments
Eager to be able to exclude sub-domains 👍
As this opinion is shared with a lot of other user(which is now including me) I’ve implemented this behavior in the PR https://github.com/darkreader/darkreader/pull/2517#issuecomment-720103367