Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't use AWS Instance GPU on GITLAB CI and CML-RUNNER

See original GitHub issue

I have this gitlab-ci.yml:

stages:
  - test
  - deploy
  - train

sast:
  stage: test
include:
- template: Security/SAST.gitlab-ci.yml

deploy_job:
  stage: deploy
  when: always
  image: iterativeai/cml:0-dvc2-base1
  script:
    - cml-runner
      --cloud aws
      --cloud-region us-east-1
      --cloud-type g3.4xlarge
      --cloud-hdd-size 64
      --cloud-aws-security-group="cml-runners-sg"
      --labels=cml-runner-gpu
      --idle-timeout=120
train_job:
  stage: train
  when: on_success
  image: iterativeai/cml:0-dvc2-base1-gpu
  tags:
    - cml-runner-gpu
  before_script:
    - pip install poetry
    - poetry --version
    - poetry config virtualenvs.create false
    - poetry install -vv
    - nvdia-smi
  script:
    # DVC Stuff
    - dvc pull
    - dvc repro -m
    - dvc push
    # Report metrics
    - echo "## Metrics" >> report.md
    - echo "\`\`\`json" >> report.md
    - cat metrics/best-meta.json >> report.md
    - echo "\`\`\`" >> report.md
    # Report GPU details
    - echo "## GPU info" >> report.md
    - cat gpu_info.txt >> report.md
    # Send comment
    - cml-send-comment report.md

But, the container can’t recognize driver or GPU, on nvidia-smi command I had the following error:

/usr/bin/bash: line 133: nvdia-smi: command not found

I realized that iterativeai/cml:0-dvc2-base1-gpu can’t use instance GPU. How could I install nvidia drivers and the nvidia-docker and activate –gpus option on this docker?

Thank you

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:24 (12 by maintainers)

Top GitHub Comments

2reactions

dacbdcommented, Dec 22, 2021

Just adding the job log on CI of the deploy_job step: deploy_job.txt

and the train_job step: job_log.txt

I see nvdia-smi bash line: 125 ? There looks to be typo in your job?

2reactions

0x2b3bfa0commented, Dec 18, 2021

If nvidia-smi works, these lines won’t run at all.

Top Results From Across the Web

How to deploy to AWS with GitLab

The gl-ec2 push-to-s3 script pushes code to an S3 bucket. For an example of ... AWS Deployment: using GitLab CI templates to deploy...

Deploy and Manage Gitlab Runners on Amazon EC2 - AWS

We use AWS CloudFormation to describe the infrastructure that is hosting the Gitlab Runner. The main steps are as follows:.

GitLab 13.9: GPU Support for Runners - YouTube

GitLab 13.9 GPU Support for RunnersTake advantage of GPU Enabled Gitlab Runners to compute ... Your browser can't play this video.

Configure GitLab CI Runner with Docker executor using AWS ...

In this tutorial, I will show you how to Configure your own GitLab Runner with a Docker Executor using AWS EC2. ...

mlx.warnings - PyPI

It can be used with GitLab-CI to enable warning threshold setting for failing ... explicitly as python module python3 -m mlx.warnings --xmlrunner --command ......