question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ignore URL query parameters when caching

See original GitHub issue

What’s the problem this feature will solve? Azure Artifacts feeds return authenticated blob storage URLs. These URLs include query parameters with time-bounded authorization values.

An example URL: https://storagesamples.blob.core.windows.net/sample-container/blob1.txt?se=2019-08-03&sp=rw&sv=2018-11-09&sr=b&skoid=<skoid>&sktid=<sktid>&skt=2019-08-02T22%3A32%3A01Z&ske=2019-08-03T00%3A00%3A00Z&sks=b&skv=2018-11-09&sig=<signature>

However, because these parameters are used as the key for pip’s cache, it means that the files are never cached locally.

Describe the solution you’d like At its simplest, not including query parameters in the cache key would be fine from my POV.

But I expect there are likely feeds out there where the query parameters actually matter. I believe the full set of parameters is (currently) se, sp, sv, sr, skoid, sktid, skt, ske, sks, skv and sig, though not all of them will always be present.

Alternative Solutions Some successful workarounds have included writing proxy apps to essentially MitM access to the feed and hide URL parameters, and also pre-downloading wheels to manually cache. I haven’t checked what other installers do, because my users aren’t going to switch to anything more heavy-weight than pip just because of this.

Additional context The relevant code seems to be https://github.com/pypa/pip/blob/main/src/pip/_internal/cache.py#L60 and https://github.com/pypa/pip/blob/main/src/pip/_internal/models/link.py#L150

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

5reactions
uranusjrcommented, Jun 16, 2021

But I think query strings can technically change page contents, and we can’t just safely ignore them. PEP 503 does not use query strings, but it also doesn’t say an implementation can’t use query strings to dynamically serve different distributions (not to mention pip supports a lot more ad-hoc solutions that can do basically anything).

I guess this either needs a PEP to outline what query strings a dependency resolver are allowed to ignore, or some sort of plugin infrastructure in pip that allows users to swap out the default cache backend (so you can implement whatever optimisations you need; you know your servers best).

0reactions
zoobacommented, Jul 1, 2021

Appreciate the suggestion, but services that expect a query string aren’t going to use a fragment instead, at least not if they’ve got a halfway decent parser (as the ones I’m dealing with do).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Best Practice: Caching Everything While Ignoring Query Strings
Ignore Query String modifies the cache key used at the Cloudflare edge to improve cache hit rates by reducing the number of unnecessary ......
Read more >
IGNORE Query Strings when Caching - WPJohnny
Go to Settings (choose “Advanced View”) > Caching > Tweaks > check “Ignore Query String” · Swift automatically ignores “fbclid” and “gclid” by ......
Read more >
Query-String. Ignore Query-String when caching content on ...
This option determines how to cache files with different query parameters. A query parameter is a unique query string (the parameter after the...
Read more >
Ignore URL / query parameters - WordPress.org
URLs containing query strings will not be cached by default added in the 1.2.5 release. Kindly download it. Viewing 3 replies - 1...
Read more >
Cache-control: Is it possible to ignore query parameters when ...
Sometimes query parameters have nothing to do with the rendering of the page at least from a server side perspective. For instance all...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found