Cache compression - cross OS support
See original GitHub issueProblems
There have been multiple reported issues related to compression of caches. The issue we are looking to solve:
- Cross-OS compatibility: Currently on windows cache uses a different compression algorithm (
gzip
) as compared to onlinux | mac
(zstd
). This leads to different versions for caches created on different platforms. Therefore caches created on windows might not be recoverable onlinux | mac
. For more details on cache version see this.
Proposal
We are looking to solve both the problems as follows::
- Change the default tar used on windows runners to GNUtar. This is already suggested as a workaround for people in these problems. Same tooling will ensure that cache can be reused across all three OSes.
- Fallback to BSDtar with zstd on windows. BSDtar is already present on Windows runners by default but it does not use
zstd
due to the issue of compression hanging with large caches. In our testing, we found that performing archiving and compression as separate processes (instead of callingtar --use-compress-program
) does not have the hang problem for caches of size up to 2GB.
Reasoning to choose GNUtar over BSDtar as default
BSDtar has some implementation problems. That’s the reason it stopped being used in MacOS for our action. For more details see https://github.com/actions/toolkit/issues/552.
Related issues which should get fixed with this proposal
We have consolidated issues related to the above problems here. Feel free to provide feedback regarding these in this issue itself.
- https://github.com/actions/cache/issues/330
- https://github.com/actions/toolkit/issues/543
- https://github.com/actions/cache/issues/301
- https://github.com/actions/cache/issues/1010
- https://github.com/actions/cache/issues/198 (Should be fixed with use of GNUtar)
- https://github.com/actions/cache/issues/125 (Won’t fix)
- https://github.com/actions/cache/issues/389 (Not fixing as of today)
Issue Analytics
- State:
- Created 10 months ago
- Reactions:2
- Comments:9 (5 by maintainers)
Top Results From Across the Web
Understanding Cache Compression - ACM Digital Library
This study sheds light on the challenges of adopting compression in cache design—from the shrinking of the data until its physical placement.
Read more >Controlling the cache key - Amazon CloudFront
Compression support These settings enable CloudFront to request and cache objects that are compressed in the Gzip or Brotli compression formats, when the ......
Read more >Safecracker: Leaking Secrets through Compressed ... - People
This paper offers the first security analysis of cache com- pression, one such promising technique that is likely to ap- pear in future...
Read more >Select GNU tar for caching if available on hosted runners #552
Changing the PATH will cause the cache to switch between BSD tar and GNU tar causing compatibility issues on Windows (due to the...
Read more >Caching guidance - Azure Architecture Center | Microsoft Learn
Learn how caching can improve the performance and scalability of a system by copying frequently accessed data to fast storage close to the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
BTW - 7-Zip does support
tar
andtar.gz
formats. And like I said, it’s pre-installed on bothwindows-2019
andwindows-2022
runner images. I’m not certain if it would be faster than BSD or GNU tar or not. Perhaps you could experiment and see?I’m glad this is being addressed, but I’m not sure the stated proposal is the right path forward. Or rather, it may be not the right path for all users.
I don’t believe that cross-OS compatibility of a single cache archive should be the primary driver, as it’s quite common to put the
runner.os
in the cache key, as seen in the basic example in the docs. In my case, we need a different set of packages cached and restored for each OS, so we wouldn’t use one common cache anyway.Also, it sounds as if the only plan for speeding things up on Windows is to switch to GNU tar. As I stated previously, and reported by others here and here - this has not resolved the problem. Cache restore is still very slow, even with GNU tar.
I think the part that is missing from the proposal is that I believe there should be an option to choose the archive format, and that should be exposed all the way up to the cache action. Personally, I would like to use
.7z
as the format on Windows. Or perhaps just.zip
, but still using 7-Zip to do the work because it’s much faster than anything that ships with Windows itself, and it’s already pre-installed on the GitHub Actions runner images.