Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Primary cache should be re-evaluated when deciding whether to update the cache in post-job phase

See original GitHub issue

I have the following step in my Maven build job to cache the CVE database created by the dependency-check plugin as it can take a while to download from scratch:

      - name: Cache CVE database for OWASP dependency-check
        uses: actions/cache@v3
        with:
          path: dependency-check-data
          key: "${{ runner.os }}-dependency-check-data-${{ hashFiles('**/nvdcve-1.1-modified.meta') }}"
          restore-keys: |
            ${{ runner.os }}-dependency-check-data-${{ hashFiles('**/nvdcve-1.1-modified.meta') }}
            ${{ runner.os }}-dependency-check-data-

The file I use to determine whether the cache should be updated is dependency-check-data/nvdcve-1.1-modified.meta. This file is only created during the build job, i.e. it is not part of the repository.

First run

Obviously no cache hit.

In the post-build step the cache is created correctly, but GitHub actions says:

Cache saved with key: Linux-dependency-check-database-

That’s not correct: it should save the cache with key Linux-dependency-check-database-e6dd7c26af2b2d399adc976c9e67d9ae0e4013e1ef99f753a41fea199a11109e.

Full debug output of the post-build step: https://gist.github.com/fransf-wtax/4f1c5c2aa5a153807bcd145aee281b33

Second run

During cache restore i.e. before the build, the key will evaluate to Linux-dependency-check-data- which is fine because the plugin should match on prefixes so the most recent cache created in previous builds would be used.

Full debug output of the build step: https://gist.github.com/fransf-wtax/58e6f867603f076f874b2b4ceff7c25f

But after the build, when the cache is updated, GitHub Actions says:

##[debug]Cache state/key: Linux-dependency-check-data-
Cache hit occurred on the primary key Linux-dependency-check-data-, not saving cache.

This is not correct. The file matching */nvdcve-1.1-modified.meta now exists, so it should re-evaluate the key "${{ runner.os }}-dependency-check-data-${{ hashFiles('**/nvdcve-1.1-modified.meta') }}" which would yield Linux-dependency-check-data-a65b573939924c996fc1da62edc8e7c7c4cdf4ea9ec5b7a3c2b495ff37333213, NOT the same as the cache that was restored and therefore not a cache hit.

Full output of the post-build step: https://gist.github.com/fransf-wtax/f69804e2f5ec2dd0c3b7577b618be3be

Conclusion

It looks like GitHub is somehow not or incorrectly evaluating the cache key expression in the post-build step. If it would re-evaluate, I believe the cache action would work as expected.

Issue Analytics

State:
Created a year ago
Comments:8

Top GitHub Comments

1reaction

prncevincecommented, Jun 2, 2022

Adding a simple boolean as a input to the action here to reevaluate in Post could be very helpful. e.g. the current default implementation inputs:reevaluate:default:false.

I’m interested from the R programming community’s point of view when it comes to using code chunk caching within the {rmarkdown} package & specifically {knitr} chunk options.

{knitr} creates & invalidates it’s cache for each code chunk while the document is being rendered. One could invalidate the cache by simply hashing the file that the knitr code chunks are within, but this does not check that individual code chunks within the file have changed or that variable inputs to the chunks have changed (e.g. the reason to invalidate/bust the cache). A much simpler solution would be to hash all files within the knitr cache’s output directory (most people will not version this - it’s binary data).

A long way of saying: Time to learn some TypeScript 😉

1reaction

fransf-wtaxcommented, Apr 12, 2022

That’s disappointing but consistent with the general feel I’m getting around GitHub Actions – that it’s an unfinished product that is really an internal tool for Microsoft or GitHub that was made public too quickly – there’s too much undocumented behaviour and several ways of achieving common tasks seem hackish. Too bad, it has potential, maybe we’ll check on it again in a few years.

Top Results From Across the Web

Caching Best Practices | Amazon Web Services

When deciding whether to cache a piece of data, consider the following questions: Is it safe to use a cached value? The same...

Caches Are Key to Scaling - Medium

A primary reason to set up caching outside of your database is to reduce load within your database engine. While scaling is easier...

Caching guidance - Azure Architecture Center | Microsoft Learn

Learn how caching can improve the performance and scalability of a system by copying frequently accessed data to fast storage close to the...

Cache memory - TechTarget

Cache memory is a chip-based computer component that makes retrieving data from the computer's memory more efficient. It acts as a temporary storage...

Best Practices to Maximize Performance III: Caching

Notice that the Denodo optimizer will decide if this data movement is worthwhile on a query-by-query basis using cost estimations. Estimating ...