Tests fail with CUDA 10.1
See original GitHub issueI am launching tests on a server with 4 V100, CUDA 10.1. Tests fail with the following error:
RunTime Error: The NVIDIA driver on your system is too old (found version 10010). Please update your GPU driver [bla bla]. Alternatively go to Pytorch and install a pytorch version that has been compiled with your version of the CUDA driver.
I also recreated the environment from scratch with conda env create -f environment.yml
.
Tests successfully complete with CPU.
Any ideas? How can I specify the CUDA driver when installing pytorch from the environment file?
Issue Analytics
- State:
- Created 3 years ago
- Comments:17 (11 by maintainers)
Top Results From Across the Web
Issues - GitHub
The following unit tests fail with CUDA 10.1/CUDNN 7.5 BatchNormTest.PositiveTestCase BatchNormTest.
Read more >CUDA Unit Tests failing on CUDA 10.1 - VTK-m - GitLab
It appears that a large number of unit tests are failing on CUDA 10.1. $ ctest --rerun-failed Total Test time (real) = 12.26...
Read more >CUDA 12.0 Release Notes - NVIDIA Documentation Center
Tegra: Application binaries built with CUDA 11.8 or older toolkit using the LTO feature may fail when running with CUDA 12.0 compat driver....
Read more >CUDA 11.4 error - Install, Configure and Update
However, the output of nvcc --version displays CUDA 10.1. ... of CUDA-10.1 CRYSPARC_CUDA_PATH and CUDA-11.4 driver with a test workflow like ...
Read more >Make check failing when GPU enabled - GROMACS forums
Hi folks, I've been trying to get a CUDA-enabled gmx to pass make check but it's timing out on a lot of tests....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I agree with @AntonioCarta. I specified pytorch version into the
.yml
(e.g.pytorch::pytorch==1.7.1
) but the error remains. I think it’s up to the user to specify its CUDA version. Something likeYes, but it depends on the drivers installed on the GPUs. We cannot control or force the installation of the drivers, so the user should provide the currently installed version of the cuda toolkit. As an example, one of our servers has the nvidia drivers version 440.100, which supports only a cudatoolkit up to 10.2. If you install the conda environment and try to run some tests the same message appears and the computation is done on the CPU.
Probably we didn’t notice this bug because we have never recreated our environments from scratch and we perform the remote testing on CPU.