question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Primary cache should be re-evaluated when deciding whether to update the cache in post-job phase

See original GitHub issue

I have the following step in my Maven build job to cache the CVE database created by the dependency-check plugin as it can take a while to download from scratch:

      - name: Cache CVE database for OWASP dependency-check
        uses: actions/cache@v3
        with:
          path: dependency-check-data
          key: "${{ runner.os }}-dependency-check-data-${{ hashFiles('**/nvdcve-1.1-modified.meta') }}"
          restore-keys: |
            ${{ runner.os }}-dependency-check-data-${{ hashFiles('**/nvdcve-1.1-modified.meta') }}
            ${{ runner.os }}-dependency-check-data-

The file I use to determine whether the cache should be updated is dependency-check-data/nvdcve-1.1-modified.meta. This file is only created during the build job, i.e. it is not part of the repository.

First run

Obviously no cache hit.

In the post-build step the cache is created correctly, but GitHub actions says:

Cache saved with key: Linux-dependency-check-database-

That’s not correct: it should save the cache with key Linux-dependency-check-database-e6dd7c26af2b2d399adc976c9e67d9ae0e4013e1ef99f753a41fea199a11109e.

Full debug output of the post-build step: https://gist.github.com/fransf-wtax/4f1c5c2aa5a153807bcd145aee281b33

Second run

During cache restore i.e. before the build, the key will evaluate to Linux-dependency-check-data- which is fine because the plugin should match on prefixes so the most recent cache created in previous builds would be used.

Full debug output of the build step: https://gist.github.com/fransf-wtax/58e6f867603f076f874b2b4ceff7c25f

But after the build, when the cache is updated, GitHub Actions says:

##[debug]Cache state/key: Linux-dependency-check-data-
Cache hit occurred on the primary key Linux-dependency-check-data-, not saving cache.

This is not correct. The file matching */nvdcve-1.1-modified.meta now exists, so it should re-evaluate the key "${{ runner.os }}-dependency-check-data-${{ hashFiles('**/nvdcve-1.1-modified.meta') }}" which would yield Linux-dependency-check-data-a65b573939924c996fc1da62edc8e7c7c4cdf4ea9ec5b7a3c2b495ff37333213, NOT the same as the cache that was restored and therefore not a cache hit.

Full output of the post-build step: https://gist.github.com/fransf-wtax/f69804e2f5ec2dd0c3b7577b618be3be

Conclusion

It looks like GitHub is somehow not or incorrectly evaluating the cache key expression in the post-build step. If it would re-evaluate, I believe the cache action would work as expected.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8

github_iconTop GitHub Comments

1reaction
prncevincecommented, Jun 2, 2022

Adding a simple boolean as a input to the action here to reevaluate in Post could be very helpful. e.g. the current default implementation inputs:reevaluate:default:false.

I’m interested from the R programming community’s point of view when it comes to using code chunk caching within the {rmarkdown} package & specifically {knitr} chunk options.

{knitr} creates & invalidates it’s cache for each code chunk while the document is being rendered. One could invalidate the cache by simply hashing the file that the knitr code chunks are within, but this does not check that individual code chunks within the file have changed or that variable inputs to the chunks have changed (e.g. the reason to invalidate/bust the cache). A much simpler solution would be to hash all files within the knitr cache’s output directory (most people will not version this - it’s binary data).

A long way of saying: Time to learn some TypeScript 😉

1reaction
fransf-wtaxcommented, Apr 12, 2022

That’s disappointing but consistent with the general feel I’m getting around GitHub Actions – that it’s an unfinished product that is really an internal tool for Microsoft or GitHub that was made public too quickly – there’s too much undocumented behaviour and several ways of achieving common tasks seem hackish. Too bad, it has potential, maybe we’ll check on it again in a few years.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Caching Best Practices | Amazon Web Services
When deciding whether to cache a piece of data, consider the following questions: Is it safe to use a cached value? The same...
Read more >
Caches Are Key to Scaling - Medium
A primary reason to set up caching outside of your database is to reduce load within your database engine. While scaling is easier...
Read more >
Caching guidance - Azure Architecture Center | Microsoft Learn
Learn how caching can improve the performance and scalability of a system by copying frequently accessed data to fast storage close to the...
Read more >
Cache memory - TechTarget
Cache memory is a chip-based computer component that makes retrieving data from the computer's memory more efficient. It acts as a temporary storage...
Read more >
Best Practices to Maximize Performance III: Caching
Notice that the Denodo optimizer will decide if this data movement is worthwhile on a query-by-query basis using cost estimations. Estimating ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found