question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Flaky builds when using cache actions with Ruby gems

See original GitHub issue

I used the caching action for caching Ruby gems used in building this Middleman website. This has greatly sped up the build process, down to under a minute from 4-5 minutes without caching.

However, this results in a too-flaky build. This is demonstrated by the commit trail in PR 23

Bug or no Bug?

After some research and experiments, this is what I think is going on. Maintainers here, please decide what to do with this issue. I’m indecisive whether this is an actions/cache issue.

My project had a dependency on ruby-sassc. On installation, this gem does some compilation. Pre version 2.2.0, it compiled into a cross platform gem. With version 2.2.0, this “cross platformness” (what’s the right word) was dropped in favour of a speed improvement, see this insightful comment (and if you’re in to it read the entire comment thread of sassc-ruby issue 146).

I suspect that every once in a while, this job gets executed by a worker that runs on an architecture incompatible with the version of the compiled gem that is stored in the cache. Then the build fails. If I re-run the build, it gets picked up by a worker running on a compatible architecture and voila, it passes again.

My solution workaround

Fix the version of the Ruby-sassc gem to version 2.1.0.

Observations

See the commit trail of PR 23 Intermittent build failures.

A good example:

Commit 3ab744a passes - run 461949888

Commit cc4b3ad (same codebase) fails - 461965463

Failed builds can often be resolved by running 1 or 2 more times using the “re-run jobs” button.

The build always fails in the Build step. Always with a message similar to:

[...]
/home/runner/work/XSCALE-Alliance.github.io/XSCALE-Alliance.github.io/vendor/bundle/ruby/2.5.0/gems/ffi-1.12.2/lib/ffi/library.rb:112: [BUG] Illegal instruction at 0x00007efffd9a5780
ruby 2.5.7p206 (2019-10-01 revision 67816) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0042 p:---- s:0222 e:000221 CFUNC  :open
[...]

I have not observed this build error in builds without the caching step.

How to reproduce

It occurs intermittently. Generally occurs once in about ten builds, sometimes more frequent.

I reproduce it by pushing minor changes to this branch, as demonstrated by the commit log PR 23

Build configuration

The build configuration is per the Ruby example in this repo.

See verify_pull_request.yml on this PR branch:

jobs:
  build:

    runs-on: ubuntu-latest

    steps:
    - name: Check Ruby Versions
      run: |
        echo "$RUNNER_TOOL_CACHE"
        ls $RUNNER_TOOL_CACHE/Ruby
    - uses: actions/checkout@v1
    - uses: actions/setup-ruby@v1
      with:
        ruby-version: '2.5'
    - name: Cache Ruby Gems
      uses: actions/cache@v1
      with:
        path: vendor/bundle
        key: ${{ runner.os }}-gems2-${{ hashFiles('**/Gemfile.lock') }}
        restore-keys: |
          ${{ runner.os }}-gems2-
    - name: Bootstrap
      run: |
        bundle config path vendor/bundle
        make bootstrap
    - name: Build
      run: make test

Further experiments

I can try the following:

  • verify sassc version used. 2.2.0 has this problem; it was reported fixed in 2.2.1, but others still have reported similar issue with 2.2.1. 2.1.x does not has this problem apparently.
  • run 10+ builds with a locked vesion of sassc v2.1.0

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:10 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
serracommented, Feb 26, 2020

Thanks for your answers @joshmgross . Your pointers are clear and I believe I’m following them already. As I don’t have a clear suggestion on how to better handle this in actions/cache, I’ll close this issue now. Thank your for your help.

1reaction
joshmgrosscommented, Feb 25, 2020

I think the platform differences can be more subtle than only bitness. But this stuff is a bit out of my league

All hosted runners for a given label will be the same VM image and architecture. We do a monthly update of that image, but it’s primarily software updates.

As a user of actions/cache, how can I implement caching that results in a reliable build?

It’s important to correctly choose a key that uniquely identifies a cache, such as including the runner OS and a hash of any dependency files (such as Gemfile.lock). Additionally, you should be careful with restore keys, as they allow pulling an older version of the cache that doesn’t match your primary key.

You can find more info at https://help.github.com/en/actions/configuring-and-managing-workflows/caching-dependencies-to-speed-up-workflows

As a user of actions/cache what assumptions do I have to assert myself to guarantee a reliable build?

Depending on your workflow and ecosystem, it’s recommended to still run the dependency install step after caching. This allows the tooling to pull any missing dependencies while benefiting from the cached dependencies already available locally.

identify the work we need to do in actions/cache

I’m open to suggestions for how we can better handle this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to load Ruby gems from cache on Github Actions
How to start CI build faster by loading Ruby gems from cache on Github Actions? You can start running your tests for a...
Read more >
Examples of flaky Rails specs along with solutions - GitHub
Run with --tag solved to see specs passes. Each flaky spec should have a paired spec, which has the same intention but aren't...
Read more >
Eliminating Flaky Ruby Tests - Gusto Engineering
Our test suite, using these best-practices, currently validates 300,000 lines of application code by running 40,000 RSpec examples over 250 ...
Read more >
Artur Trzop (@ArturTrzop) / Twitter
How to run Ruby on Rails tests on Github Actions using RSpec ... How to start CI build faster by loading Ruby gems...
Read more >
GitHub Actions in Action - Jenna Pederson
With this action, I was able to use the right version of Ruby as well ... build error with not being able to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found