Potentially better and faster pip caching
See original GitHub issueI’ve followed the instructions to get pip cached successfully.
It’s important to understand that the pip cache only affects downloading of packages, not the actual installation. My app is not huge by any means, but still, even with cached downloads, installation takes about a minute or more.
In fact, without pip cache at all, downloads are actually pretty quick.
I had much better performance with this hack:
Notice that I’m caching the actual pip directories. I could hard code the value (I only test on ubuntu), but wanted to stick to the principles of getting the site path.
Still hacky, because of binaries. I need to be able to run pytest
, so just caching site-packages was not enough. The getsitepackages() will give you something like this /opt/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/
. And if you restore that directory, pip will know not to reinstall stuff.
However, your binaries will be missing from the path. So I traverse out a few dirs and cache the whole thing.
Not sure if it’s worth documenting this approach, or if there’s a better layer based cache coming off some sort that will make this moot.
But it cut down my pip setup from about 1 minute to 1 second.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:14 (2 by maintainers)
Top GitHub Comments
Please don’t cache site-packages or entire interpreter trees. That is fragile (sensitive to python patch version and OS) and pip should be pretty fast as long it’s cache is populated.
If there are instances where pip’s still slow, please file an issue on pip’s issue tracker (look for an existing one, before filing a new one), and we should put in the work to enhance pip. It’s not gonna magically happen on it’s own, but I don’t think pushing for fragile optimizations on CI providers is a good alternative approach.
Users who are willing to deal with the consequences of such a caching strategy can do so in their specific CI pipelines themselves. I would be very concerned if this became something that GitHub Actions or Azure Pipelines start suggesting users to do, or even make it easier by providing a mechanism to do so via an official-esque package.
It’s “kinda” valid. But is safe to use only if the OS+Python version is exactly the same. So this has to be accounted for in the cache key.