question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't use AWS Instance GPU on GITLAB CI and CML-RUNNER

See original GitHub issue

I have this gitlab-ci.yml:

stages:
  - test
  - deploy
  - train

sast:
  stage: test
include:
- template: Security/SAST.gitlab-ci.yml

deploy_job:
  stage: deploy
  when: always
  image: iterativeai/cml:0-dvc2-base1
  script:
    - cml-runner
      --cloud aws
      --cloud-region us-east-1
      --cloud-type g3.4xlarge
      --cloud-hdd-size 64
      --cloud-aws-security-group="cml-runners-sg"
      --labels=cml-runner-gpu
      --idle-timeout=120
train_job:
  stage: train
  when: on_success
  image: iterativeai/cml:0-dvc2-base1-gpu
  tags:
    - cml-runner-gpu
  before_script:
    - pip install poetry
    - poetry --version
    - poetry config virtualenvs.create false
    - poetry install -vv
    - nvdia-smi
  script:
    # DVC Stuff
    - dvc pull
    - dvc repro -m
    - dvc push
    # Report metrics
    - echo "## Metrics" >> report.md
    - echo "\`\`\`json" >> report.md
    - cat metrics/best-meta.json >> report.md
    - echo "\`\`\`" >> report.md
    # Report GPU details
    - echo "## GPU info" >> report.md
    - cat gpu_info.txt >> report.md
    # Send comment
    - cml-send-comment report.md

But, the container can’t recognize driver or GPU, on nvidia-smi command I had the following error:

/usr/bin/bash: line 133: nvdia-smi: command not found

I realized that iterativeai/cml:0-dvc2-base1-gpu can’t use instance GPU. How could I install nvidia drivers and the nvidia-docker and activate –gpus option on this docker?

Thank you

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:24 (12 by maintainers)

github_iconTop GitHub Comments

2reactions
dacbdcommented, Dec 22, 2021

Just adding the job log on CI of the deploy_job step: deploy_job.txt

and the train_job step: job_log.txt

I see nvdia-smi bash line: 125 ? There looks to be typo in your job?

2reactions
0x2b3bfa0commented, Dec 18, 2021

If nvidia-smi works, these lines won’t run at all.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to deploy to AWS with GitLab
The gl-ec2 push-to-s3 script pushes code to an S3 bucket. For an example of ... AWS Deployment: using GitLab CI templates to deploy...
Read more >
Deploy and Manage Gitlab Runners on Amazon EC2 - AWS
We use AWS CloudFormation to describe the infrastructure that is hosting the Gitlab Runner. The main steps are as follows:.
Read more >
GitLab 13.9: GPU Support for Runners - YouTube
GitLab 13.9 GPU Support for RunnersTake advantage of GPU Enabled Gitlab Runners to compute ... Your browser can't play this video.
Read more >
Configure GitLab CI Runner with Docker executor using AWS ...
In this tutorial, I will show you how to Configure your own GitLab Runner with a Docker Executor using AWS EC2. ...
Read more >
mlx.warnings - PyPI
It can be used with GitLab-CI to enable warning threshold setting for failing ... explicitly as python module python3 -m mlx.warnings --xmlrunner --command ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found