question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GitHub Self Hosted Runners randomly stall when communicating with JobServerQueue

See original GitHub issue

Describe the bug GitHub runners randomly stop communicating with JobServerQueue, hanging the jobs and fail to report any logs to the interface until cancelled. Sometimes cancellations take minutes/hours to complete as well.

To Reproduce

  1. Run GItHub Runner via Docker 2.283.3
  2. Auto upgrade to 2.284.0
  3. Run multiple jobs
  4. Hope for random error

Expected behavior Runners should detail an error when not able to connect back to the GitHub servers or JobServerQueue.

Runner Version and Platform

2.283.3 (auto updates to 2.284.0 AWS EKS Spot Instances

What’s not working?

Jobs run for multiple hour(s) if not cancelled with no update or reporting.

[2021-11-18 20:28:10Z INFO ScriptHandler] Which: 'bash'
[2021-11-18 20:28:10Z INFO ScriptHandler] Location: '/usr/bin/bash'
[2021-11-18 20:28:10Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:28:10Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:28:10Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:28:10Z INFO HostContext] Well known directory 'Temp': '/home/runner/_work/_temp'
[2021-11-18 20:28:10Z INFO ScriptHandler] Which: 'bash'
[2021-11-18 20:28:10Z INFO ScriptHandler] Location: '/usr/bin/bash'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper] Starting process:
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper]   File name: '/usr/bin/bash'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper]   Arguments: '--noprofile --norc -e -o pipefail /home/runner/_work/_temp/447c6845-6bc7-4a13-a1df-7dcf68888ff1.sh'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper]   Working directory: '/home/runner/_work/arni-plugin/arni-plugin'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper]   Require exit code zero: 'False'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper]   Encoding web name:  ; code page: ''
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper]   Force kill process on cancellation: 'False'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper]   Redirected STDIN: 'False'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper]   Persist current code page: 'False'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper]   Keep redirected STDIN open: 'False'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper]   High priority process: 'False'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper] Updated oom_score_adj to 500 for PID: 4132.
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper] Process started with process id 4132, waiting for process exit.
[2021-11-18 20:28:11Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:12Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:12Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:13Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:14Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:28:14Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:28:14Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:28:14Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:15Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:15Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:16Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:16Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:16Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:24Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:28:24Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:28:24Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:28:31Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:34Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:28:34Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:28:34Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:28:39Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:40Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:40Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:41Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:41Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:43Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:44Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:28:44Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:28:44Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:28:46Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:46Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:47Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:47Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:54Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:28:54Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:28:54Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:28:57Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:00Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:04Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:29:04Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:29:04Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:29:06Z INFO JobServerQueue] Stop aggressive process web console line queue.
[2021-11-18 20:29:07Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:07Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:08Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:11Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:14Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:29:14Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:29:14Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:29:23Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:24Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:29:24Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:29:24Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:29:24Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
....
[2021-11-18 20:29:34Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:29:34Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:29:34Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:29:35Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:35Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:36Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:36Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:42Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:44Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:29:44Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:29:44Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:29:48Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:54Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:29:54Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:29:54Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:29:56Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:30:04Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:30:04Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:30:04Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
......
[2021-11-18 20:34:38Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:34:44Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:34:44Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:34:44Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:34:52Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:34:54Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:34:54Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:34:54Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:35:04Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:35:04Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:35:04Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:35:05Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:35:11Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:35:13Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:35:14Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:35:14Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:35:14Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
....
[2021-11-18 20:35:38Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:35:39Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:35:44Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:35:44Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:35:44Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:35:54Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:35:54Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:35:54Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:36:00Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:36:00Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:36:04Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:36:04Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:36:04Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
.....
[2021-11-18 20:38:44Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:38:44Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:38:44Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:38:54Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:38:54Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:38:54Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
.....
[2021-11-18 21:27:05Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 21:27:05Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 21:27:05Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'

Job Log Output

N/A - logs are not coming back

Runner and Worker’s Diagnostic Logs

N/A from what was posted above. JobServerQueue just dies off, and HostContext continues on…no other logs show.

further notes

This is completely random and seems to happen on a runner that has had multiple job attempts on it. It hangs until cancelled by an outside entity (user input or EKS scale down).

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
jbkc85commented, Mar 15, 2022

@nikola-jokic im sorry - I lost track of this in my other issues that ive raised. I am not sure if its necessarily fixed - but it hasn’t shown its ugly face for a while! It could be due to API limitations honestly, its difficult to say. I appreciate the follow up!

0reactions
nikola-jokiccommented, Mar 14, 2022

I’m going to close out this issue until we hear back from you. Please, try using the newest version of the runner and if you confirm that you are still seeing this issue on the newer version, we will re-open it and we will investigate this further. We would just need more information to reproduce this behaviour 😊

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dealing with jobs failing with "lost communication with the ...
The runner container got OOM-killed due to that your node has insufficient resource and your runner pod had low priority. Use a more...
Read more >
GitHub Actions self-hosted runner keeps stuck between jobs
Wait for about 15 minutes. One of runners gets randomly stuck, although the "Runners" section under organization's settings shows the a job is ......
Read more >
Workflow failure due to runner shutdown/stoppage #2040
I am now having this experience with self-hosted runners in AWS with no apparent cause. Disk is fine, Mem is fine, CPU is...
Read more >
[Self-hosted] job abandoned · Issue #1546 · actions/runner
The self-hosted runner: xxx lost communication with the server. Verify the machine is running and has a healthy network connection.
Read more >
Job goes from Queued to Failed without posting any logs ...
Transfer to repository linked to the self-hosted runner to another account; Transfer it back to its original owner; Jobs are not running.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found