Tests fail with CUDA 10.1See original GitHub issue
I am launching tests on a server with 4 V100, CUDA 10.1. Tests fail with the following error:
RunTime Error: The NVIDIA driver on your system is too old (found version 10010). Please update your GPU driver [bla bla]. Alternatively go to Pytorch and install a pytorch version that has been compiled with your version of the CUDA driver.
I also recreated the environment from scratch with
conda env create -f environment.yml.
Tests successfully complete with CPU.
Any ideas? How can I specify the CUDA driver when installing pytorch from the environment file?
- Created 3 years ago
- Comments:17 (11 by maintainers)
Top GitHub Comments
I agree with @AntonioCarta. I specified pytorch version into the
pytorch::pytorch==1.7.1) but the error remains. I think it’s up to the user to specify its CUDA version. Something like
CUDA_VERSION = args # 9.2, 10.1, 10.2, 11.0, cpu conda env create -f environment.yml conda activate avalanche-env if CUDA_VERSION == cpu: conda install pytorch torchvision torchaudio cpuonly -c pytorch else: conda install pytorch torchvision torchaudio cudatoolkit=CUDA_VERSION -c pytorch
Yes, but it depends on the drivers installed on the GPUs. We cannot control or force the installation of the drivers, so the user should provide the currently installed version of the cuda toolkit. As an example, one of our servers has the nvidia drivers version 440.100, which supports only a cudatoolkit up to 10.2. If you install the conda environment and try to run some tests the same message appears and the computation is done on the CPU.
Probably we didn’t notice this bug because we have never recreated our environments from scratch and we perform the remote testing on CPU.