Ephemeral (single use) runner registrations
See original GitHub issueDescribe the bug
When starting a self hosted runner with ./run.cmd --once
, the runner sometimes accepts a second job before shutting down, which causes that second job to fail with the message:
The runner: [runner-name] lost communication with the server. Verify the machine is running and has a healthy network connection.
This looks like the same issue recently fixed here: microsoft/azure-pipelines-agent#2728
To Reproduce Steps to reproduce the behavior:
-
Create a repo, enable GitHub Actions, and add a new workflow
-
Configure a new runner on your machine
-
Run the runner with
./run.cmd --once
-
Queue two runs of your workflow
-
The first job will run and the runner will go offline
-
(Optionally) configure and start a second runner
-
The second job will time out after several minutes with the message:
The runner: [runner-name] lost communication with the server. Verify the machine is running and has a healthy network connection.
(where
[runner-name]
is the name of the first runner) -
Also: trying to remove the first runner with the command
./config.cmd remove --token [token]
will result in the following error until the second job times out:Failed: Removing runner from the server Runner "[runner-name]" is running a job for pool "Default"
Expected behavior The second job should run on (and wait for) any new runner that comes online rather than try to run as a second job on the, now offline, original runner.
Runner Version and Platform
2.262.1 on Windows
Runner and Worker’s Diagnostic Logs
Issue Analytics
- State:
- Created 3 years ago
- Reactions:57
- Comments:33 (5 by maintainers)
Top GitHub Comments
Help us @bryanmacfarlane, you’re our only hope! 🙏
@rclmenezes ack on #1 and #2. we’re currently designing and working on it. The plan is exactly what you laid out, register the runner ephemeral with the backend service so the service auto cleans it up after the job is complete and the runner / container exits.