NVIDIA drivers not installing on Azure cloud runner
See original GitHub issueHi everybody
I am trying to use cml-runner
on GitLab to deploy a GPU machine on which to run training. The deployment works great but the docker container then running the training can’t find any NVIDIA drivers it seems, as I can’t run ‘nvidia-smi’.
My .gitlab-ci.yml looks like this (simplified):
stages:
- deploy
- train
deploy:
stage: deploy
when: always
image: dvcorg/cml:0-dvc2-base1-gpu
script:
- cml-runner
--cloud azure
--cloud-region eu-west
--cloud-type Standard_NC4as_T4_v3
--cloud-hdd-size 128
--cloud-gpu v100
--labels=cml-runner-gpu
train:
stage: train
when: on_success
image: dvcorg/cml:0-dvc2-base1-gpu
tags:
- cml-runner-gpu
script:
- nvidia-smi
In the examples it’s not mentioned that I need to install the drivers myself on the deployed machine, it looks like it should work out-of-the-box, or am I overlooking something? Is that only for AWS? Do I need to pass a script installing the drivers through --cloud-startup-script
?
Cheers
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (5 by maintainers)
Top Results From Across the Web
Azure N-series NVIDIA GPU driver setup for Windows
Open a command prompt and change to the C:\Program Files\NVIDIA Corporation\NVSMI directory. · Run nvidia-smi . If the driver is installed, you ...
Read more >Couldn't communicate with the NVIDIA driver - Linux
I started using azure nvidia-gpu-optimized-vmi-a10 vm. But there are no nvidia drivers in that VM and I am unable to install them also....
Read more >How to install NVIDIA graphics driver on Azure VM
Under advanced settings during the setup of a VM, you can click on 'select an extension to install' which will give the option...
Read more >How to Setup NVIDIA Driver on NV-Series Azure VM
Download the NVIDIA driver setup file from Azure Blob storage. I put the setup file in blob storage to make sure that this...
Read more >Installing GPU Drivers on Linux Machines
Be sure to install the specified driver, and not the latest available version. ... To install NVIDIA driver for all other instances, including...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi, the issue is still persisting. If I pass a startup script that installs the drivers it works, though, so like this:
I even manually ssh’d to the created machine and could confirm that no NVIDIA drivers were installed if I didn’t pass the above script.
That’s great! Thanks for the quick fix 😃