question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Deploy waits for 'existing' runner which doesn't exist

See original GitHub issue

I’ve been having this issue where if i set the --reuse flag, subsequent jobs will fail to start because they are waiting for a runner that has already terminated.

Run cml-runner \
Reusing existing runners with the cml-runner labels...

No further CML runs will launch unless I cancel them and then restart them. This is my cml.yaml:

name: Run DVC repro
on: [push]
jobs:
  deploy-runner:
    runs-on: [ubuntu-latest]
    steps:
      - uses: actions/checkout@v2
      - uses: iterative/setup-cml@v1

      - name: deploy
        shell: bash
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: |
          cml-runner \
          --cloud aws \
          --cloud-region eu-west \
          --cloud-type=g4dn.xlarge \
          --cloud-spot true \
          --reuse \
          --labels=cml-runner,reviews-labeler \
          --idle-timeout 300
  run:
    needs: deploy-runner
    runs-on: [self-hosted,cml-runner]
    container: docker://dvcorg/cml:0-dvc2-base1-gpu
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-python@v2
        with:
          python-version: '3.8.5'
      - name: cml_run
        shell: bash
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: |
          # Install dependencies
          pip install -r requirements.txt
          pip install .

          dvc repro

          # Report metrics
          echo "## Metrics" > report.md
          git fetch --prune

          dvc metrics diff main --show-md | grep "Change\|\-\-\-" >> report.md
          dvc metrics diff main --show-md | grep -v "threshold-" | grep "weighted" | sort >> report.md
          sed "s/results\///g" -i report.md

          cml-send-comment report.md
 
          dvc push          

      - uses: EndBug/add-and-commit@v7
        with:
           add: 'dvc.lock --force'
           pull_strategy: 'NO-PULL'
           message: 'chg: dvc repro'

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
0x2b3bfa0commented, Jun 26, 2021

You’re welcome, @ivyleavedtoadflax!

Then, it’s not a bug, but a feature! 😉

1reaction
ivyleavedtoadflaxcommented, Jun 26, 2021

ahhh ok, that makes sense. Actually that is very helpful because AWS seem to have rather tight limits on how many vCPUs are available in the G and P ranges. So, queuing up jobs in this way will prevent a job from failing because an instance type is not available.

Thanks @0x2b3bfa0 !

Read more comments on GitHub >

github_iconTop Results From Across the Web

helm install --wait does not wait for deployment pod readiness ...
I recently discovered and started using --wait: --wait if set, will wait until all Pods, PVCs, Services, and minimum number of Pods of...
Read more >
Azure Pipeline Deployment to App Service fails - Stack Overflow
Azure Pipeline Deployment to App Service fails: "Resource doesn't exist. Resource should exist before deployment". But App Service is running.
Read more >
Troubleshooting GitLab Runner
This probably occurs because when the runner is removed, the role bindings are removed. The runner pod continues until the job completes, and...
Read more >
Check your Helm deployments! - Polar Squad
First, there's no official command to wait for a deployment to finish that's separate from the install and upgrade procedure similar to kubectl ......
Read more >
Troubleshoot pipeline runs - Azure DevOps - Microsoft Learn
It can be helpful to run the command locally from your own machine, and/or log-in to the machine and run the command as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found