question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spot instances- Runner must be able to restart workflow

See original GitHub issue

Ideally a third job could help, a workflow for GH and GL would be:

stages:
  - ml
  - check

train:
  stage: ml
  tags:
    - gpu
   - check

  cache:
    paths:
    - ./models
    
  script:
    -  echo "setup a pipeline here"

check:
  stage: check
  when: on_failure
  needs:
    - train

  script:
    - echo "Restarting..."
name: cml

on: [push]

jobs:
  train:
    # needs: deploy
    runs-on: [self-hosted,gpu]

    steps:
      - uses: actions/checkout@v2

      - name: Cache multiple paths
        uses: actions/cache@v2
        with:
          path: |
            ./models
          key: models

      - name: cml_run
        shell: bash
        env:
          repo_token: ${{ secrets.GITHUB_TOKEN }} 
        run: |
          echo "setup a pipeline here"

  check:
    if: failure()
    needs: train
    runs-on: [ubuntu-latest]
    steps:
      - name: cml_check
        run: |
          echo "Restarting...."

however this approach has has two issues:

  • While in GH the lost of the runner can be recovered ending with a failed job in GL the job without a valid runner can run forever. I opened a ticket here
  • The biggest drawback would be restarting the workflow in a loop. Having the runner the ability to listen for the spot instance eviction will be a better warranty of acting properly

This implies that we have to provide the cleanup scripts when deploying the spot instances, this scrips just only need to run the runner cleanup and restart. of the workflow.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:14 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
DavidGOrtegacommented, May 5, 2021

@SebastianCallh We have been two days discussing this and we made a small prototype. I can tell you an exact day but its close. The trick resides in our runner.

2reactions
DavidGOrtegacommented, Apr 30, 2021

Sure, let me check with the team what are the estimations of this

Read more comments on GitHub >

github_iconTop Results From Across the Web

Best practices for handling EC2 Spot Instance interruptions
You do this by configuring the Auto Scaling group to launch instances of multiple sizes, and families, across multiple Availability Zones.
Read more >
Amazon EC2 Spot Instances FAQs
Q. What is a Spot Instance? Spot instances are spare EC2 capacity that can save you up 90% off of On-Demand prices that...
Read more >
Extra CI flexibility with Github Runner on AWS Spot Instances
But first, let's talk about what a self-hosted GitHub runner is, and why we should use it. What's a GitHub Actions runner? A...
Read more >
Autoscaling GitLab Runner on AWS EC2
The first step is to install GitLab Runner in an EC2 instance that will serve as the ... Since Spot instances are often...
Read more >
Using Spot Instances for MapReduce Workflows
See Spot Run: Using Spot Instances for MapReduce Workflows. Navraj Chohan†∗ Claris Castillo Mike ... tion of Google MapReduce that is made available...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found