question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cml-runner fails to deploy runners on ec2

See original GitHub issue

Hey everyone, A random issue started appearing yesterday and cml-runner now fails to deploy runners. The issue seems to coincide with the release of version 0.7.0 but switching back to 0.6.3 does not seem to sort the problem! Updating to 0.7.1 also didn’t fix the problem!

The command I am running is:

name: Run-Engine-Tests

      - name: "Deploy runner on EC2"
        shell: bash
        env:
          repo_token: ${{ secrets.ACCESS_TOKEN_CML_TESTING }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID_TESTING }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY_TESTING }}
          CASE_NAME: ${{ matrix.case_name }}
          N_RUNNERS: ${{ fromJson(needs.setup_config.outputs.json_string).n_runners }}
          NEPTUNE_CUSTOM_RUN_ID: ${{ needs.setup_neptune_custom_run_id.outputs.neptune_custom_run_id }}

        run: |
          for (( i=1; i<=N_RUNNERS; i++ ))
          do
            echo "Deploying runner ${i}"
            cml-runner \
            --cloud aws \
            --cloud-region eu-west-2 \
            --cloud-type=m \
            --cloud-hdd-size 100 \
            --cloud-spot \
            --labels=cml-runner-${NEPTUNE_CUSTOM_RUN_ID} || exit 1 &
          done
          wait
          echo "Deployed ${N_RUNNERS} runners."
      - run: >-
          cat "$TF_LOG_PATH"

I’ve cut it a bit short so that you can only see the relevant part. I’m also attaching the terraform logs, hopefully it helps!

Looking at the EC2 console on the AWS side, I can see that the EC2 instances spin up properly, but then get shut down after about 30 seconds. On the spot requests tab, the status is displayed as terminated-by-user, so it’s not AWS shutting them down.

Finally, I also noticed that the name of the runners on EC2 is now Hosted Agent, which didn’t use to be the case before. It used to be something like iterative-<random_stirng>. Not sure if it’s relevant but putting it out there just in case!

1_Set up job.txt 2_Run actionscheckout@v2.txt 3_Run iterativesetup-cml@v1.txt 4_Deploy runner on EC2.txt 5_Run cat $TF_LOG_PATH.txt 10_Post Run actionscheckout@v2.txt 11_Complete job.txt

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
thatGreekGuy96commented, Sep 23, 2021

yup i can confirm this fixes it! Do you mind explaining why 😅 i’m just curious!

2reactions
0x2b3bfa0commented, Sep 23, 2021

@thatGreekGuy96, please run the following to confirm the issue:

      - name: "Deploy runner on EC2"
        shell: bash
        env:
          repo_token: ${{ secrets.ACCESS_TOKEN_CML_TESTING }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID_TESTING }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY_TESTING }}
          CASE_NAME: ${{ matrix.case_name }}
          N_RUNNERS: ${{ fromJson(needs.setup_config.outputs.json_string).n_runners }}
          NEPTUNE_CUSTOM_RUN_ID: ${{ needs.setup_neptune_custom_run_id.outputs.neptune_custom_run_id }}
        run: |
          for (( i=1; i<=N_RUNNERS; i++ ))
          do
            echo "Deploying runner ${i}"
            RUNNER_NAME= cml-runner \
            --cloud aws \
            --cloud-region eu-west-2 \
            --cloud-type=m \
            --cloud-hdd-size 100 \
            --cloud-spot \
            --labels=cml-runner-${NEPTUNE_CUSTOM_RUN_ID} || exit 1 &
          done
          wait
          echo "Deployed ${N_RUNNERS} runners."
      - run: >-
          cat "$TF_LOG_PATH"
Read more comments on GitHub >

github_iconTop Results From Across the Web

runner | CML
The total exceeding 10 minutes is considered a failure, resulting in cml runner terminating the instance and exiting with an error. For example:...
Read more >
Deploy and Manage Gitlab Runners on Amazon EC2 - AWS
This post will guide you through utilizing Infrastructure-as-Code (IaC) to automate Gitlab Runner deployment and administrative tasks on Amazon ...
Read more >
Cml / GitHub Actions / aws - Questions | Data Version Control
I'm trying to run cml-runner in aws via GitHub actions. The script below opens an instance that we can see in the aws...
Read more >
Training and saving models with CML on a self-hosted AWS ...
Training and saving models with CML on a self-hosted AWS EC2 runner (part 1) ... --labels=cml-runner \ --single train-model: needs: deploy-runner runs-on: ...
Read more >
CICD fails to execute jobs on EC2 instances - GitLab Forum
The GitLab runner is configured to spawn a new AWS EC2 instance to run ... image fails to install due to missing plugin,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found