question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cache compression - cross OS support

See original GitHub issue

Problems

There have been multiple reported issues related to compression of caches. The issue we are looking to solve:

  • Cross-OS compatibility: Currently on windows cache uses a different compression algorithm (gzip) as compared to on linux | mac (zstd). This leads to different versions for caches created on different platforms. Therefore caches created on windows might not be recoverable on linux | mac. For more details on cache version see this.

Proposal

We are looking to solve both the problems as follows::

  • Change the default tar used on windows runners to GNUtar. This is already suggested as a workaround for people in these problems. Same tooling will ensure that cache can be reused across all three OSes.
  • Fallback to BSDtar with zstd on windows. BSDtar is already present on Windows runners by default but it does not use zstd due to the issue of compression hanging with large caches. In our testing, we found that performing archiving and compression as separate processes (instead of calling tar --use-compress-program) does not have the hang problem for caches of size up to 2GB.

Reasoning to choose GNUtar over BSDtar as default

BSDtar has some implementation problems. That’s the reason it stopped being used in MacOS for our action. For more details see https://github.com/actions/toolkit/issues/552.

Related issues which should get fixed with this proposal

We have consolidated issues related to the above problems here. Feel free to provide feedback regarding these in this issue itself.

Issue Analytics

  • State:open
  • Created 10 months ago
  • Reactions:2
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
mattjohnsonpintcommented, Nov 16, 2022

BTW - 7-Zip does support tar and tar.gz formats. And like I said, it’s pre-installed on both windows-2019 and windows-2022 runner images. I’m not certain if it would be faster than BSD or GNU tar or not. Perhaps you could experiment and see?

2reactions
mattjohnsonpintcommented, Nov 16, 2022

I’m glad this is being addressed, but I’m not sure the stated proposal is the right path forward. Or rather, it may be not the right path for all users.

I don’t believe that cross-OS compatibility of a single cache archive should be the primary driver, as it’s quite common to put the runner.os in the cache key, as seen in the basic example in the docs. In my case, we need a different set of packages cached and restored for each OS, so we wouldn’t use one common cache anyway.

Also, it sounds as if the only plan for speeding things up on Windows is to switch to GNU tar. As I stated previously, and reported by others here and here - this has not resolved the problem. Cache restore is still very slow, even with GNU tar.

I think the part that is missing from the proposal is that I believe there should be an option to choose the archive format, and that should be exposed all the way up to the cache action. Personally, I would like to use .7z as the format on Windows. Or perhaps just .zip, but still using 7-Zip to do the work because it’s much faster than anything that ships with Windows itself, and it’s already pre-installed on the GitHub Actions runner images.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding Cache Compression - ACM Digital Library
This study sheds light on the challenges of adopting compression in cache design—from the shrinking of the data until its physical placement.
Read more >
Controlling the cache key - Amazon CloudFront
Compression support​​ These settings enable CloudFront to request and cache objects that are compressed in the Gzip or Brotli compression formats, when the ......
Read more >
Safecracker: Leaking Secrets through Compressed ... - People
This paper offers the first security analysis of cache com- pression, one such promising technique that is likely to ap- pear in future...
Read more >
Select GNU tar for caching if available on hosted runners #552
Changing the PATH will cause the cache to switch between BSD tar and GNU tar causing compatibility issues on Windows (due to the...
Read more >
Caching guidance - Azure Architecture Center | Microsoft Learn
Learn how caching can improve the performance and scalability of a system by copying frequently accessed data to fast storage close to the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found