GCP cloud runner not terminating
See original GitHub issueThis is a repeat of #661, which was supposedly fixed in #653. Unfortunately, I’m not seeing any changes in the shutdown behavior of my GCP compute instances. That is, they keep running past the timeout interval.
I’m using the same workflow as before (in #661):
name: 'Train-in-the-cloud-GCP'
on:
workflow_dispatch:
jobs:
deploy-runner:
runs-on: [ubuntu-latest]
steps:
- uses: iterative/setup-cml@v1
- uses: actions/checkout@v2
- name: 'Deploy runner on GCP'
shell: bash
env:
REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
# Notice use of `GOOGLE_APPLICATION_CREDENTIALS_DATA` instead of
# `GOOGLE_APPLICATION_CREDENTIALS`. Contrary to what docs suggest, the
# latter causes problems for terraform.
GOOGLE_APPLICATION_CREDENTIALS_DATA: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS }}
run: |
cml-runner \
--cloud gcp \
--cloud-region europe-west1-b \
--cloud-type=n1-standard-1 \
--labels=cml-runner
model-training:
needs: deploy-runner
runs-on: [self-hosted, cml-runner]
container: docker://dvcorg/cml-py3:latest
steps:
- uses: actions/checkout@v2
- name: 'Train my dummy model'
env:
REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
run: |
echo "Training a super awesome model"
sleep 5
echo "Training complete"
Anyway, this seems to contradict the tests, as @DavidGOrtega explains in the comments under #653:
[…] tests with TPI indicates that the instances are disposed after the expected time.
Any idea what I might be doing wrong?
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:7 (5 by maintainers)
Top Results From Across the Web
Troubleshoot Cloud Run issues
If requests are terminating with error code 503 before reaching the request timeout set in Cloud Run, you might need to update the...
Read more >cloud run is closing the container even if my script is still ...
I want to run a long-running job on cloud run. This is a red herring. On Cloud Run, there's no guarantee that the...
Read more >runner
--no-retry : Don't restart the workflow when terminated due to instance disposal or GitHub Actions timeout. --single : Terminate runner after one workflow ......
Read more >About self-hosted runners
If a self-hosted runner does not start executing the job within this limit, the job is terminated and fails to complete. API requests...
Read more >The Kubernetes executor for GitLab Runner
When empty, it does not define the allowPrivilegeEscalation flag in the container ... Duration after the processes running in the pod are sent...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @dacbd, sorry to keep you waiting. Been a while since I looked at this.
Anyway, I’m happy to confirm that instances are now indeed stopped and deleted as expected! 😃 That’s using the exact same workflow as above. Great to see you’ve made progress with this. Thanks!
@lemontheme I believe this issue is resolved, can you confirm your workflow is functional without any workarounds?