GitHub Self Hosted Runners randomly stall when communicating with JobServerQueue
See original GitHub issueDescribe the bug
GitHub runners randomly stop communicating with JobServerQueue
, hanging the jobs and fail to report any logs to the interface until cancelled. Sometimes cancellations take minutes/hours to complete as well.
To Reproduce
- Run GItHub Runner via Docker 2.283.3
- Auto upgrade to 2.284.0
- Run multiple jobs
- Hope for random error
Expected behavior Runners should detail an error when not able to connect back to the GitHub servers or JobServerQueue.
Runner Version and Platform
2.283.3 (auto updates to 2.284.0 AWS EKS Spot Instances
What’s not working?
Jobs run for multiple hour(s) if not cancelled with no update or reporting.
[2021-11-18 20:28:10Z INFO ScriptHandler] Which: 'bash'
[2021-11-18 20:28:10Z INFO ScriptHandler] Location: '/usr/bin/bash'
[2021-11-18 20:28:10Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:28:10Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:28:10Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:28:10Z INFO HostContext] Well known directory 'Temp': '/home/runner/_work/_temp'
[2021-11-18 20:28:10Z INFO ScriptHandler] Which: 'bash'
[2021-11-18 20:28:10Z INFO ScriptHandler] Location: '/usr/bin/bash'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper] Starting process:
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper] File name: '/usr/bin/bash'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper] Arguments: '--noprofile --norc -e -o pipefail /home/runner/_work/_temp/447c6845-6bc7-4a13-a1df-7dcf68888ff1.sh'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper] Working directory: '/home/runner/_work/arni-plugin/arni-plugin'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper] Require exit code zero: 'False'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper] Encoding web name: ; code page: ''
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper] Force kill process on cancellation: 'False'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper] Redirected STDIN: 'False'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper] Persist current code page: 'False'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper] Keep redirected STDIN open: 'False'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper] High priority process: 'False'
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper] Updated oom_score_adj to 500 for PID: 4132.
[2021-11-18 20:28:10Z INFO ProcessInvokerWrapper] Process started with process id 4132, waiting for process exit.
[2021-11-18 20:28:11Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:12Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:12Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:13Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:14Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:28:14Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:28:14Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:28:14Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:15Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:15Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:16Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:16Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:16Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:24Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:28:24Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:28:24Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:28:31Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:34Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:28:34Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:28:34Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:28:39Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:40Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:40Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:41Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:41Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:43Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:44Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:28:44Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:28:44Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:28:46Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:46Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:47Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:47Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:28:54Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:28:54Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:28:54Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:28:57Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:00Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:04Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:29:04Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:29:04Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:29:06Z INFO JobServerQueue] Stop aggressive process web console line queue.
[2021-11-18 20:29:07Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:07Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:08Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:11Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:14Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:29:14Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:29:14Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:29:23Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:24Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:29:24Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:29:24Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:29:24Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
....
[2021-11-18 20:29:34Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:29:34Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:29:34Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:29:35Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:35Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:36Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:36Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:42Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:44Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:29:44Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:29:44Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:29:48Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:29:54Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:29:54Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:29:54Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:29:56Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:30:04Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:30:04Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:30:04Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
......
[2021-11-18 20:34:38Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:34:44Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:34:44Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:34:44Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:34:52Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:34:54Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:34:54Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:34:54Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:35:04Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:35:04Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:35:04Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:35:05Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:35:11Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:35:13Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:35:14Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:35:14Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:35:14Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
....
[2021-11-18 20:35:38Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:35:39Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:35:44Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:35:44Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:35:44Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:35:54Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:35:54Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:35:54Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:36:00Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:36:00Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:36:04Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:36:04Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:36:04Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
.....
[2021-11-18 20:38:44Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 20:38:44Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 20:38:44Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[2021-11-18 20:38:54Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:38:54Z INFO JobServerQueue] Try to append 1 batches web console lines for record '0e67d3e9-5759-5c4c-bea4-1b40d9526dc5', success rate: 1/1.
[2021-11-18 20:38:54Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
.....
[2021-11-18 21:27:05Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin.2.284.0'
[2021-11-18 21:27:05Z INFO HostContext] Well known directory 'Root': '/home/runner'
[2021-11-18 21:27:05Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
Job Log Output
N/A - logs are not coming back
Runner and Worker’s Diagnostic Logs
N/A from what was posted above. JobServerQueue just dies off, and HostContext continues on…no other logs show.
further notes
This is completely random and seems to happen on a runner that has had multiple job attempts on it. It hangs until cancelled by an outside entity (user input or EKS scale down).
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:7 (4 by maintainers)
Top GitHub Comments
@nikola-jokic im sorry - I lost track of this in my other issues that ive raised. I am not sure if its necessarily fixed - but it hasn’t shown its ugly face for a while! It could be due to API limitations honestly, its difficult to say. I appreciate the follow up!
I’m going to close out this issue until we hear back from you. Please, try using the newest version of the runner and if you confirm that you are still seeing this issue on the newer version, we will re-open it and we will investigate this further. We would just need more information to reproduce this behaviour 😊