Spot instances- Runner must be able to restart workflow
See original GitHub issueIdeally a third job could help, a workflow for GH and GL would be:
stages:
- ml
- check
train:
stage: ml
tags:
- gpu
- check
cache:
paths:
- ./models
script:
- echo "setup a pipeline here"
check:
stage: check
when: on_failure
needs:
- train
script:
- echo "Restarting..."
name: cml
on: [push]
jobs:
train:
# needs: deploy
runs-on: [self-hosted,gpu]
steps:
- uses: actions/checkout@v2
- name: Cache multiple paths
uses: actions/cache@v2
with:
path: |
./models
key: models
- name: cml_run
shell: bash
env:
repo_token: ${{ secrets.GITHUB_TOKEN }}
run: |
echo "setup a pipeline here"
check:
if: failure()
needs: train
runs-on: [ubuntu-latest]
steps:
- name: cml_check
run: |
echo "Restarting...."
however this approach has has two issues:
- While in GH the lost of the runner can be recovered ending with a failed job in GL the job without a valid runner can run forever. I opened a ticket here
- The biggest drawback would be restarting the workflow in a loop. Having the runner the ability to listen for the spot instance eviction will be a better warranty of acting properly
This implies that we have to provide the cleanup scripts when deploying the spot instances, this scrips just only need to run the runner cleanup and restart. of the workflow.
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (8 by maintainers)
Top Results From Across the Web
Best practices for handling EC2 Spot Instance interruptions
You do this by configuring the Auto Scaling group to launch instances of multiple sizes, and families, across multiple Availability Zones.
Read more >Amazon EC2 Spot Instances FAQs
Q. What is a Spot Instance? Spot instances are spare EC2 capacity that can save you up 90% off of On-Demand prices that...
Read more >Extra CI flexibility with Github Runner on AWS Spot Instances
But first, let's talk about what a self-hosted GitHub runner is, and why we should use it. What's a GitHub Actions runner? A...
Read more >Autoscaling GitLab Runner on AWS EC2
The first step is to install GitLab Runner in an EC2 instance that will serve as the ... Since Spot instances are often...
Read more >Using Spot Instances for MapReduce Workflows
See Spot Run: Using Spot Instances for MapReduce Workflows. Navraj Chohan†∗ Claris Castillo Mike ... tion of Google MapReduce that is made available...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@SebastianCallh We have been two days discussing this and we made a small prototype. I can tell you an exact day but its close. The trick resides in our runner.
Sure, let me check with the team what are the estimations of this