question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

--gpus option not working on recently updated docker image

See original GitHub issue

I have the following YAML workflow:

on:
  push:
    branches:
      - GPU-debug

jobs:
  deploy-runner:
    runs-on: [ubuntu-latest]
    steps:
      - uses: iterative/setup-cml@v1
      - uses: actions/checkout@v2
      - name: Deploy runner on EC2
        env:
          PERSONAL_ACCESS_TOKEN: ${{ secrets.REPO_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-west-1
        run: |
          cml-runner \
              --repo https://github.com/sergeychuvakin/DVC_CML_sanbox \
              --token=$PERSONAL_ACCESS_TOKEN \
              --cloud aws \
              --cloud-region us-west-1 \
              --cloud-type=g3.4xlarge \
              --labels=cml-runner \
              --idle-timeout 30
    
  model-training:
    timeout-minutes: 5000
    needs: [deploy-runner]
    runs-on: [self-hosted, cml-runner]
    container:
      image: docker://dvcorg/cml:0-dvc2-base0-gpu
      options: --gpus all
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-python@v2
        with:
          python-version: '3.7'
      - name: Train model
        env:
          repo_token: ${{ secrets.REPO_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: |
          nvidia-smi
        shell: bash

I face the following error, while image building:

Screenshot 2021-08-17 at 17 47 12

Namely I tried different images: docker://dvcorg/cml:0-dvc2-base0-gpu or docker://dvcorg/cml:0-dvc2-base1-gpu gave me the same error

When i disabled options --gpus all - this error was resolved but at the same time nvidia-smi was not found

Screenshot 2021-08-17 at 17 50 38

Thanks in advance!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
0x2b3bfa0commented, Aug 19, 2021

You’re welcome! We won’t ever know what the exact issue was, but at least it’s solved. 🙃

0reactions
sergeychuvakincommented, Aug 19, 2021

@0x2b3bfa0 yes indeed I cannot reproduce as well. Looks like you’re right - issue was on AWS side. Now it works as expected. Thank you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using GPU from a docker container? - cuda - Stack Overflow
Run Docker container with nvidia driver pre-installed ... I'm running on ubuntu server 14.04 and i'm using the latest cuda (6.0.37 for linux...
Read more >
Enabling GPU access with Compose - Docker Documentation
Enabling GPU access with Compose. Compose services can define GPU device reservations if the Docker host contains such devices and the Docker Daemon...
Read more >
Docker cannot use GPU even having ENV ... - GitHub
The thing is that I am using Pycharm and I cannot include --gpus option in my run configuration. I tried to add option...
Read more >
How to Properly Use the GPU within a Docker Container
First, Make Sure Your Base Machine Has GPU Drivers. You must first install NVIDIA GPU drivers on your base machine before you can...
Read more >
Using Your GPU in a Docker Container - Roboflow Blog
The NVIDIA Container Toolkit is the solution to configure your GPU within a Docker container. Follow this step-by-step guide to get started.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found