question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The linux runner does not gracefully shutdown on SIGINT

See original GitHub issue

Describe the bug According to https://github.com/actions/runner/issues/2190#issuecomment-1273302389, the runner should wait for the job is finished and then stop. But it doesn’t happen, and new tasks are assigned to the runner with an old PID.

To Reproduce Steps to reproduce the behavior:

  1. Go to a runner with a running job and execute pgrep -af Runner.Listener
  2. Remember it and execute pkill -INT Runner.Listener
  3. Wait for a one task finished, and another one is started on the same runner.
  4. The pgrep -af Runner.Listener does not change/is still there
  5. The pkill -INT run.sh does not work as well

Expected behavior The runner should gracefully shutdown. If it does not by SIGINT, there must be another way to stop it right after the job is finished, before assigning a new one

Runner Version and Platform

Linux 2.298.2 amd64

What’s not working?

Please include error messages and screenshots.

Issue Analytics

  • State:open
  • Created 5 months ago
  • Reactions:5
  • Comments:13 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
Felixoidcommented, Jun 19, 2023

The situation with ephemeral runners is a bit better than with normal. See the discussion in https://github.com/ClickHouse/ClickHouse/pull/49283. It’s not necessary to shut down the host, the process could restart.

It still doesn’t guarantee that you could tear down the runner process if there are no jobs assigned there for long, unfortunately. Imagine, we have a pool of 30 runners, and only 24 of them have running jobs. After some period, 60 seconds in our case, we shut down each one of them that still don’t have things to do. And at this moment, GH reports there was a job assigned to one of these poor runners.

It looks like GH assigns the jobs to runners, and not a runner is assigning a job by connecting to the API. If so, no matter what, there will be killed jobs.

All described above is my own conclusion. It’s based on the long time playing left and right with different schemes to get runners working reliably, but desperately failing again and again.

0reactions
asos-tommycouzenscommented, Jun 15, 2023

Would also love this!

We are in the process of setting up self hosted runners, and would like a safe way to scale down runners without causing running jobs to fail. Without the SIGTERM functionality as requested in this issue this is not possible without us building a quite complex and fragile orchestrator.

We similarly do not use ephermeal runners because we want to make use of docker caching. The availability of docker caching was the primary reason we chose github actions over azure devops.

Read more comments on GitHub >

github_iconTop Results From Across the Web

In what order should I send signals to gracefully shutdown ...
The right signal for termination is SIGTERM and if SIGTERM doesn't terminate the process instantly, as you might prefer, it's because the ...
Read more >
Running script at startup and exiting gracefully at shutdown
The problem is that during shutdown, it does not exit gracefully. This can cause (important) data loss. My script has a SIGINT handler...
Read more >
Graceful shutdown in Go | by Emre Tanriverdi - Medium
I wanted to show a quick guide on how to gracefully shutdown our Go applications, which is actually a very easy process.
Read more >
Gracefully halt cancelled jobs (#3031) · Issues
Problem to solve. When a job is cancelled in the GitLab UI, the runner sends a SIGKILL to the job, preventing it from...
Read more >
Is SIGINIT the preferred way over SIGTERM to gracefully ...
If it receives a SIGTERM will any running requests continue to run until they complete, but no additional requests are taken (i.e. graceful...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found