Add ability to debounce URLs based on a regex applied to the path
See original GitHub issueSummary of feature
To expand the existing debounce capability in Brave, we’d like to able to specify a generic regex pattern for picking out destination URLs from a given URL. Note that for security reasons, we will only apply this regex pattern on the URL’s path for now.
The motivating use-case is to debounce AMP cache URLs to the canonical URLs. These URLs look something like https://www-theverge-com.cdn.ampproject.org/c/s/www.theverge.com/platform/amp/2018/9/20/17881766/bing-google-amp-support-mobile-news – we would want to debounce that URL to https://www.theverge.com/platform/amp/2018/9/20/17881766/bing-google-amp-support-mobile-news. We also need the ability to predicate a debounce rule on a user preference, to support use-cases like De-AMP where a user turning off the preference should turn off the relevant debouncing rule.
There are several other examples captured in this issue: https://github.com/brave/brave-browser/issues/22429. Note that given that we’re limited to applying the regex to the URL’s path, we can capture only a subset of those.
Description of changes
- This would be a new
action
:regex-path
supported in debounce.json. - The
include
andexclude
pattern would function as before. - The
param
would be a regular expression that has a capture group that should pick out one (and only one) well-formed URL on evaluation when applied on the original URL’spath
. - The evaluation of the
param
expression will be done when applying the rule, because that’s when we have the original URL. - We should URL unescape the target URL.
- There will be a new key called
prepend_scheme
:http|https
that, only if specified, will add the specified scheme (http or https) to the captured string value. Note that as a safety check, if the captured string value is already a valid GURL ANDprepend_scheme
is specified, then we error out. Once we’ve prepended the scheme, re-try parsing the captured string value as a GURL.prepend_scheme
helps us capture the case in AMP cache URLs where the scheme is not specified in the original URL and would thus never be parsed into a valid GURL. - NOTE: There should be support for a new
pref
key in the debounce.json JSON that will be optionally specified. If it exists, then the value should be abrave-core
supported user preference, for e.g."pref" : "brave.de_amp.enabled"
(as specified in https://github.com/brave/brave-core/blob/master/components/de_amp/common/pref_names.cc#L12).
debounce rules file
debounce.json is the file that contains all the debounce rules. It’s located locally at:
~/Library/Application Support/BraveSoftware/Brave-Browser-Nightly/afalakplffnnnlkncjhbmahjfjhmlkal/[latest version]/1/debounce.json
if you’re on a Mac,/c/Users/.../AppData/Local/BraveSoftware/Brave-Browser-Nightly/User Data/afalakplffnnnlkncjhbmahjfjhmlkal/[latest version]/1/debounce.json
on Windows,~/.config/BraveSoftware/Brave-Browser-Nightly/afalakplffnnnlkncjhbmahjfjhmlkal/[latest version]/1/debounce.json
on Linux.- Not sure where the file lives on iOS:
/ios/debounce.json
? @cuba - Not sure if debouncing was even tested on Android: https://github.com/brave/brave-browser/issues/15090
The file is updated as part of the Brave Local Data Updater
component. After editing it you need to restart your browser; make sure you’re editing the latest version of the file. It looks something like this:
[
{
"include": [
"*://out.reddit.com/*"
],
"exclude": [
],
"action": "redirect",
"param": "url"
},
...
]
After the changes mentioned above, the following rule would be supported:
[
{
"include": [
"*://brave.com/*"
],
"pref": "brave.de_amp.enabled",
"exclude": [
],
"prepend_scheme": "https",
"action": "regex-path",
"param": "^/(.*)$"
},
...
]
With this rule, the URL https://brave.com/https://braveattentiontoken.com would be debounced to https://braveattentiontoken.com if the user preference brave.de_amp.enabled
(the De-AMP pref) is switched on. Note that https://brave.com/xyz would not be debounced despite matching the param
pattern because xyz
is not a valid URL even after prepending https
scheme to it.
Test Cases
Note: if debouncing flag is off in brave://flags, none of these should work.
@fmarier added some test rules to the production debounce.json
file, so the tests on https://dev-pages.brave.software/navigation-tracking/debouncing.html should work even without modifying the debounce.json
file.
In order to comprehensively test this feature, we’ll need to make edits to the local debounce.json
file as mentioned above. Given that this is an enhancement of the debouncing feature, it would be good to add these test case rules to the existing debounce.json
file and make sure to regression test the feature using https://dev-pages.brave.software/navigation-tracking/debouncing.html.
- If
action
is notregex-path
, andparam
is a regex, fail i.e. don’t debounce based on that rule but other rules work. Go to https://brave.com/blog/, should not get debounced. Sample rule. param
is a malformed regex, fail. Go to https://brave.com/blog/, should not get debounced. Sample rule.param
captures no strings, fail. Go to https://test.com/https://brave.com, should not load (because it doesn’t exist). Sample rule. Now, replace theparam
with^/(.*)$
- https://test.com/https://brave.com should now redirect to https://brave.com.param
captures > 1 strings, fail. Go to https://test.com/https://brave.com, should not get debounced. Sample rule.param
captures a string that is not a URL, fail. Go to https://test.com/brave.com, should not get debounced. Sample rule.pref
is a pref that does not exist, fail. Go to https://test.com/https://brave.com/, should not load. Now, changepref
tobrave.de_amp.enabled
and make sure De-AMP is enabled in brave://settings/shields. Now https://test.com/https://brave.com/ redirects to https://brave.com. Sample rule.pref
is not specified (should work by default), pass.pref
is a valid pref, should correctly gate rule application (both true and false). Have pref bebrave.de_amp.enabled
and toggle the De-AMP setting off and on and make sure we only debounce https://test.com/https://brave.com/ when it’s on. Sample rule.prepend_scheme
is specified and original URL is a valid URL without the scheme. Success. Go to https://test.com/brave.com/ - should redirect to https://brave.com. Sample rule.prepend_scheme
is specified and original URL is a valid URL with the scheme. Failure. Go to https://test.com/https://brave.com/ - should not load. Sample rule.
Issue Analytics
- State:
- Created a year ago
- Comments:14 (13 by maintainers)
Top GitHub Comments
@LaurenWags i think just the one in https://github.com/brave/brave-browser/issues/23121#issuecomment-1182526462 should be sufficient
@LaurenWags @kjozwiak you should be able to use the third test here: https://dev-pages.brave.software/navigation-tracking/debouncing.html