question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GCP cloud runner not terminating

See original GitHub issue

This is a repeat of #661, which was supposedly fixed in #653. Unfortunately, I’m not seeing any changes in the shutdown behavior of my GCP compute instances. That is, they keep running past the timeout interval.

I’m using the same workflow as before (in #661):

name: 'Train-in-the-cloud-GCP'
on: 
  workflow_dispatch:

jobs:
  deploy-runner:
    runs-on: [ubuntu-latest]
    steps:
      - uses: iterative/setup-cml@v1
      - uses: actions/checkout@v2
      - name: 'Deploy runner on GCP'
        shell: bash
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          # Notice use of `GOOGLE_APPLICATION_CREDENTIALS_DATA` instead of
          # `GOOGLE_APPLICATION_CREDENTIALS`. Contrary to what docs suggest, the
          # latter causes problems for terraform.
          GOOGLE_APPLICATION_CREDENTIALS_DATA: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS }}
        run: |
          cml-runner \
          --cloud gcp \
          --cloud-region europe-west1-b	 \
          --cloud-type=n1-standard-1 \
          --labels=cml-runner
          
  model-training:
    needs: deploy-runner
    runs-on: [self-hosted, cml-runner]
    container: docker://dvcorg/cml-py3:latest
    steps:
      - uses: actions/checkout@v2
      - name: 'Train my dummy model'
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
        run: |
          echo "Training a super awesome model"
          sleep 5
          echo "Training complete"

Anyway, this seems to contradict the tests, as @DavidGOrtega explains in the comments under #653:

[…] tests with TPI indicates that the instances are disposed after the expected time.

Any idea what I might be doing wrong?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
lemonthemecommented, Oct 19, 2021

Hi @dacbd, sorry to keep you waiting. Been a while since I looked at this.

Anyway, I’m happy to confirm that instances are now indeed stopped and deleted as expected! 😃 That’s using the exact same workflow as above. Great to see you’ve made progress with this. Thanks!

3reactions
dacbdcommented, Oct 14, 2021

@lemontheme I believe this issue is resolved, can you confirm your workflow is functional without any workarounds?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot Cloud Run issues
If requests are terminating with error code 503 before reaching the request timeout set in Cloud Run, you might need to update the...
Read more >
cloud run is closing the container even if my script is still ...
I want to run a long-running job on cloud run. This is a red herring. On Cloud Run, there's no guarantee that the...
Read more >
runner
--no-retry : Don't restart the workflow when terminated due to instance disposal or GitHub Actions timeout. --single : Terminate runner after one workflow ......
Read more >
About self-hosted runners
If a self-hosted runner does not start executing the job within this limit, the job is terminated and fails to complete. API requests...
Read more >
The Kubernetes executor for GitLab Runner
When empty, it does not define the allowPrivilegeEscalation flag in the container ... Duration after the processes running in the pod are sent...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found