Self-hosted runners terminated prematurely
See original GitHub issueAdd insult to injury, machines aren’t being terminated after the failure:
$ journalctl --unit cml
-- Logs begin at Sat 2022-10-08 21:59:41 UTC, end at Thu 2022-11-17 14:42:27 UTC. --
Nov 17 10:53:35 ip-172-31-81-177 systemd[1]: Started cml.service.
Nov 17 10:53:35 ip-172-31-81-177 cml.sh[2214]: % Total % Received % Xferd Average Speed Time Time Time Current
Nov 17 10:53:35 ip-172-31-81-177 cml.sh[2214]: Dload Upload Total Spent Left Speed
Nov 17 10:53:35 ip-172-31-81-177 cml.sh[2214]: [158B blob data]
Nov 17 10:53:36 ip-172-31-81-177 cml.sh[2214]: {"level":"warn","message":"Github Actions timeout has been updated from 72h to 35 days. Update your workflow accordingly to be able to restart it automatically."}
Nov 17 10:53:36 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Preparing workdir /tmp/tmp.FFQVuNxrbe/.cml/cml-679w0e9r8o..."}
Nov 17 10:53:36 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Launching github runner"}
Nov 17 10:53:41 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Terraform 1.3.2"}
Nov 17 10:53:41 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Plan: 0 to add, 0 to change, 0 to destroy."}
Nov 17 10:53:41 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Apply complete! Resources: 0 added, 0 changed, 0 destroyed."}
Nov 17 10:53:41 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Outputs: 0"}
Nov 17 10:53:41 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Connected to acpid service."}
Nov 17 10:53:51 ip-172-31-81-177 cml.sh[2214]: {"date":"2022-11-17T10:53:51.690Z","level":"info","message":"runner status","repo":"https://github.com/iterative/cml-textual-inversion","status":"ready"}
Nov 17 10:58:23 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 153 seconds!"}
Nov 17 11:00:56 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 11:00:56 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 11:00:56 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 11:00:56 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 11:04:40 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 3377 seconds!"}
Nov 17 12:00:57 ip-172-31-81-177 cml.sh[2214]: {"date":"Thu Nov 17 2022 12:00:57 GMT+0000 (Coordinated Universal Time)","error":{"name":"HttpError","request":{"headers":{"accept":"application/vnd.github.v3+json","authorization":"to
Nov 17 12:00:57 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Unregistering runner cml-679w0e9r8o..."}
Nov 17 12:00:57 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 12:00:57 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 12:00:57 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 12:00:57 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 12:00:57 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 12:00:58 ip-172-31-81-177 cml.sh[2214]: {"level":"error","message":"\tFailed: Bad request - Runner \"cml-679w0e9r8o\" is still running a job\""}
Nov 17 12:00:58 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Waiting 10 seconds to destroy"}
Nov 17 12:01:00 ip-172-31-81-177 systemd[1]: cml.service: Main process exited, code=exited, status=1/FAILURE
Nov 17 12:01:02 ip-172-31-81-177 systemd[1]: cml.service: Failed with result 'exit-code'.
[^1]: Pass also e.g. --cloud-startup-script=$(echo 'curl https://github.com/0x2b3bfa0.keys >> /home/ubuntu/.ssh/authorized_keys' | base64 -w 0)
to ensure SSH access.
Issue Analytics
- State:
- Created 10 months ago
- Reactions:2
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Self hosted runner in Linux container exits job prematurely #921
Job is exiting (github runner is cancelling the job) prematurely, during a memory intensive portion. The step that we are exiting consistently ...
Read more >Automatically scale self-hosted runners in AWS to meet demand
Select the Enable scale-in protection checkbox to will protect instances from being terminated prematurely. Jobs may be completed out of the ...
Read more >Managing self hosted CI runners at scale with EC2 spot ...
A cost effective solution for running self hosted runners at scale using Github actions and AWS EC2 Spot instances As engineering teams grow ......
Read more >How we streamlined Apple M1 Support with self-hosted ...
A self-hosted solution. We ended up setting up a self-hosted GitHub Actions runner, on a hosted Mac M1 that we rent from MacStadium....
Read more >GitHub self hosted runners on AWS Spot | Lothar Schulz
That enables Github Action job execution after the ssh connection has been terminated. For a permanent installation, one may start the runner ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Have an update on this yet?
Moreover, rate limits were being hit probably because there is an
await
missing here:https://github.com/iterative/cml/blob/8e26590385fba6aedf9355ab483a43f392ea283e/src/drivers/github.js#L381