Error when auto-upgrading runner service on MacOS - service fails to restart after upgrade
See original GitHub issueDescribe the bug When attempting to auto-upgrade the runner service on MacOS, the runner service repeatedly fails to re-launch, and the machine has to be rebooted in order for it to launch succesfully.
To Reproduce Steps to reproduce the behavior:
- Install the runner (but don’t start it)
- Install the service (but don’t start it) via
./svc.sh install
- Ensure the service runs automatically upon boot without needing to log in (as per https://github.com/actions/runner/issues/349):
sudo cp /Users/dbadmin/Library/LaunchAgents/actions.runner.diffblue-cover.ci-mac-04.plist /Library/LaunchDaemons/actions.runner.diffblue-cover.ci-mac-04.plist
- Reboot machine and check (via actions page) that github can see the runner is online
- Wait for an upgrade - the service will then go offline and not restart unless you reboot the machine manually.
Expected behavior
The service should upgrade and restart correctly (which I was able to observe happening fine when upgrading from 2.273.3 to 2.273.4):
2020-09-23 14:42:59Z: Listening for Jobs
Runner update in progress, do not shutdown runner.
Downloading 2.273.4 runner
Waiting for current job finish running.
Generate and execute update script.
Runner will exit shortly for update, should back online within 10 seconds.
Runner listener exited with error code 3
Runner listener exit because of updating, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
√ Connected to GitHub
2020-09-23 14:46:49Z: Listening for Jobs
Runner Version and Platform
Upgrading 2.273.4 to 2.273.5 on Mac OS 10.15.5
Job Log Output
If applicable, include the relevant part of the job / step log output here. All sensitive information should already be masked out, but please double-check before pasting here.
Runner and Worker’s Diagnostic Logs
/Users/dbadmin/Library/Logs/actions.runner.diffblue-cover.ci-mac-04/stderr.og
is empty
Output of /Users/dbadmin/Library/Logs/actions.runner.diffblue-cover.ci-mac-04/stdout.log
Runner update in progress, do not shutdown runner.
Downloading 2.273.5 runner
Waiting for current job finish running.
Generate and execute update script.
Runner will exit shortly for update, should back online within 10 seconds.
Runner listener exited with error code 3
Runner listener exit because of updating, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
Runner listener exited with error code null
Runner listener exit with undefined return code, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
Runner listener exited with error code null
Runner listener exit with undefined return code, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
Runner listener exited with error code null
Runner listener exit with undefined return code, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
Runner listener exited with error code null
Runner listener exit with undefined return code, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
Runner listener exited with error code null
Runner listener exit with undefined return code, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
Runner listener exited with error code null
Runner listener exit with undefined return code, re-launch runner in 5 seconds.
Up until I reboot the machine, whereupon stdout.txt
changes to:
Shutting down runner listener
Sending SIGINT to runner listener to stop
.path=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
Starting Runner listener with startup type: service
Started listener process
Started running service
2020-10-06 15:30:15Z: Runner connect error: nodename nor servname provided, or not known. Retrying until reconnected.
√ Connected to GitHub
2020-10-06 15:30:47Z: Runner reconnected.
and everything is normal.
This was the output of the relevant SelfUpdate log in _diag/
cat SelfUpdate-20201005-180911.log.succeed
[2020-10-05 19:09:11-4N] --------whoami--------
dbadmin
[2020-10-05 19:09:11-4N] --------whoami--------
[2020-10-05 19:09:11-4N] Waiting for Runner.Listener (432) to complete
[2020-10-05 19:09:11-4N] Process 432 finished running
[2020-10-05 19:09:11-4N] Sleep 1 more second to make sure process exited
[2020-10-05 19:09:12-4N] Delete existing junction bin folder
[2020-10-05 19:09:12-4N] Delete existing junction externals folder
[2020-10-05 19:09:12-4N] Create junction bin folder
[2020-10-05 19:09:12-4N] Create junction externals folder
[2020-10-05 19:09:12-4N] Update succeed
[2020-10-05 19:09:12-4N] Rename /Users/dbadmin/actions-runner/_diag/SelfUpdate-20201005-180911.log to be /Users/dbadmin/actions-runner/_diag/SelfUpdate-20201005-180911.log.succeed
/Users/dbadmin/actions-runner/_diag/SelfUpdate-20201005-180911.log -> /Users/dbadmin/actions-runner/_diag/SelfUpdate-20201005-180911.log.succeed
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:6
Top GitHub Comments
Thanks for the information!
Our team has been able to reproduce this issue and is working on a fix, hoping to roll it out with the next runner so that we don’t see this happen again. I appreciate the detailed writeup @kevindumanoir, it helped us narrow down the issue considerably.
This is a tricky one. We do have the same issue at Fabernovel and here’s what we discovered :
runsvc.sh
. It is a node.js script invoked using a specific version of Node.JS, located at<YOUR_ACTIONS_RUNNER_PATH>/externals/node12/bin/node
. There’s one important thing to remember : externals is a symbolic linkRunnerService.js
fails to spawn subprocess because Apple System Policy,syspolicyd
, considers<YOUR_ACTIONS_RUNNER_PATH>/externals.X.A/node12/bin/node
as a malware. 🤯node
is considered as a malware, because the binary file, loaded in RAM, cannot be found on the hard drive.node
cannot be found because the binary doesn’t exist anymore: We do have invokednode
fromexternals
folder, but after the update the symbolic link has been updated : it now points to the new version of actions runner, i.e.externals.X.X
. The thing is, when the service started the watchdog, it usednode
executable fromexternals.X.A
. And yes, it seems this folder does not exists anymore.n - 1
. When updating to versionn+1
, foldersexternals-(n-1)
andbin-(n-1)
are deleted.Conclusion : this issue occurs when, after launching service in runner actions version N, the runner auto-updates a second time.
So, how can we deal with it ? Obviously, Apple cannot rely on a symbolic path to check malware behavior of a program. Here is some non-exhaustive ideas:
runsvc.sh
externals.X.X
folder for the currently used node process (not really a good idea imo)SuccessfulExit