Deploy waits for 'existing' runner which doesn't exist
See original GitHub issueI’ve been having this issue where if i set the --reuse
flag, subsequent jobs will fail to start because they are waiting for a runner that has already terminated.
Run cml-runner \
Reusing existing runners with the cml-runner labels...
No further CML runs will launch unless I cancel them and then restart them. This is my cml.yaml:
name: Run DVC repro
on: [push]
jobs:
deploy-runner:
runs-on: [ubuntu-latest]
steps:
- uses: actions/checkout@v2
- uses: iterative/setup-cml@v1
- name: deploy
shell: bash
env:
REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: |
cml-runner \
--cloud aws \
--cloud-region eu-west \
--cloud-type=g4dn.xlarge \
--cloud-spot true \
--reuse \
--labels=cml-runner,reviews-labeler \
--idle-timeout 300
run:
needs: deploy-runner
runs-on: [self-hosted,cml-runner]
container: docker://dvcorg/cml:0-dvc2-base1-gpu
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: '3.8.5'
- name: cml_run
shell: bash
env:
REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: |
# Install dependencies
pip install -r requirements.txt
pip install .
dvc repro
# Report metrics
echo "## Metrics" > report.md
git fetch --prune
dvc metrics diff main --show-md | grep "Change\|\-\-\-" >> report.md
dvc metrics diff main --show-md | grep -v "threshold-" | grep "weighted" | sort >> report.md
sed "s/results\///g" -i report.md
cml-send-comment report.md
dvc push
- uses: EndBug/add-and-commit@v7
with:
add: 'dvc.lock --force'
pull_strategy: 'NO-PULL'
message: 'chg: dvc repro'
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
helm install --wait does not wait for deployment pod readiness ...
I recently discovered and started using --wait: --wait if set, will wait until all Pods, PVCs, Services, and minimum number of Pods of...
Read more >Azure Pipeline Deployment to App Service fails - Stack Overflow
Azure Pipeline Deployment to App Service fails: "Resource doesn't exist. Resource should exist before deployment". But App Service is running.
Read more >Troubleshooting GitLab Runner
This probably occurs because when the runner is removed, the role bindings are removed. The runner pod continues until the job completes, and...
Read more >Check your Helm deployments! - Polar Squad
First, there's no official command to wait for a deployment to finish that's separate from the install and upgrade procedure similar to kubectl ......
Read more >Troubleshoot pipeline runs - Azure DevOps - Microsoft Learn
It can be helpful to run the command locally from your own machine, and/or log-in to the machine and run the command as...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
You’re welcome, @ivyleavedtoadflax!
Then, it’s not a bug, but a feature! 😉
ahhh ok, that makes sense. Actually that is very helpful because AWS seem to have rather tight limits on how many vCPUs are available in the G and P ranges. So, queuing up jobs in this way will prevent a job from failing because an instance type is not available.
Thanks @0x2b3bfa0 !