Ignore URL query parameters when caching
See original GitHub issueWhat’s the problem this feature will solve? Azure Artifacts feeds return authenticated blob storage URLs. These URLs include query parameters with time-bounded authorization values.
An example URL: https://storagesamples.blob.core.windows.net/sample-container/blob1.txt?se=2019-08-03&sp=rw&sv=2018-11-09&sr=b&skoid=<skoid>&sktid=<sktid>&skt=2019-08-02T22%3A32%3A01Z&ske=2019-08-03T00%3A00%3A00Z&sks=b&skv=2018-11-09&sig=<signature>
However, because these parameters are used as the key for pip’s cache, it means that the files are never cached locally.
Describe the solution you’d like At its simplest, not including query parameters in the cache key would be fine from my POV.
But I expect there are likely feeds out there where the query parameters actually matter. I believe the full set of parameters is (currently) se
, sp
, sv
, sr
, skoid
, sktid
, skt
, ske
, sks
, skv
and sig
, though not all of them will always be present.
Alternative Solutions Some successful workarounds have included writing proxy apps to essentially MitM access to the feed and hide URL parameters, and also pre-downloading wheels to manually cache. I haven’t checked what other installers do, because my users aren’t going to switch to anything more heavy-weight than pip just because of this.
Additional context The relevant code seems to be https://github.com/pypa/pip/blob/main/src/pip/_internal/cache.py#L60 and https://github.com/pypa/pip/blob/main/src/pip/_internal/models/link.py#L150
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
But I think query strings can technically change page contents, and we can’t just safely ignore them. PEP 503 does not use query strings, but it also doesn’t say an implementation can’t use query strings to dynamically serve different distributions (not to mention pip supports a lot more ad-hoc solutions that can do basically anything).
I guess this either needs a PEP to outline what query strings a dependency resolver are allowed to ignore, or some sort of plugin infrastructure in pip that allows users to swap out the default cache backend (so you can implement whatever optimisations you need; you know your servers best).
Appreciate the suggestion, but services that expect a query string aren’t going to use a fragment instead, at least not if they’ve got a halfway decent parser (as the ones I’m dealing with do).