question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The NVIDIA driver on your system is too old (found version 10020).

See original GitHub issue

Not actually 100% sure that this is a dsub issue, but I’m trying to run a Docker image which is based on gcr.io/deeplearning-platform-release/pytorch-gpu.1-6:latest. When I execute python, I get the following error in dsub:

Failure message: Stopped running "user-command": exit status 1: /site-packages/torch/nn/modules/module.py", line 225, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 247, in _apply
    param_applied = fn(param)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 463, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
  File "/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 150, in _lazy_init
    _check_driver()
  File "/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 63, in _check_driver
    of the CUDA driver.""".format(str(torch._C._cuda_getDriverVersion())))
AssertionError: 
The NVIDIA driver on your system is too old (found version 10020).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

I believe this is mapped via dsub and so this isn’t something I can fix on my end. Is that accurate?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
mbookmancommented, Dec 4, 2020

Great!

And Tim noted in the gcp-life-sciences-discuss thread:

No, this isn’t a misunderstanding on your part. We encountered some stability issues when using cos-extensions and so have been waiting to migrate until they are resolved. In the meantime, the version used by the Life Sciences API sometimes lags behind the default COS version. Our release next week will bring it back in sync (450.51.06).

1reaction
mbookmancommented, Dec 2, 2020

Thanks @carbocation .

Doing a bit of further digging, it appears from the serial console when booting a Pipelines API VM that the API controller is explicitly installing version 440.64.00

[   48.011769] exec_start.sh[594]: + NVIDIA_DRIVER_VERSION=440.64.00
[   48.011797] exec_start.sh[594]: + NVIDIA_DRIVER_MD5SUM=
[   48.011826] exec_start.sh[594]: + NVIDIA_INSTALL_DIR_HOST=/var/lib/nvidia
[   48.011858] exec_start.sh[594]: + NVIDIA_INSTALL_DIR_CONTAINER=/usr/local/nvidia
[   48.011887] exec_start.sh[594]: + ROOT_MOUNT_DIR=/root
[   48.011915] exec_start.sh[594]: + CACHE_FILE=/usr/local/nvidia/.cache
[   48.013025] exec_start.sh[594]: + LOCK_FILE=/root/tmp/cos_gpu_installer_lock
[   48.013059] exec_start.sh[594]: + LOCK_FILE_FD=20
[   48.013088] exec_start.sh[594]: + set +x
[   48.018760] exec_start.sh[594]: [INFO    2020-12-02 22:18:26 UTC] PRELOAD: false
[   48.020060] exec_start.sh[594]: [INFO    2020-12-02 22:18:26 UTC] Running on COS build id 13310.1041.24
[   48.020772] exec_start.sh[594]: [INFO    2020-12-02 22:18:26 UTC] Data dependencies (e.g. kernel source) will be fetched from https://storage.googleapis.com/cos-tools/13310.1041.24

That seems inconsistent with what was had been posted to GCP Life Sciences Discuss here:

https://groups.google.com/g/gcp-life-sciences-discuss/c/DIhQdGhVZT4

but perhaps the specific planned rollout just hasn’t happened yet:

we are planning to deprecate the VirtualMachine.nvidia_driver_version field of the Cloud Life Sciences RunPipeline method shortly after the release of COS 85 to the stable track.

I’ll post a question back on that thread to get an update.

Read more comments on GitHub >

github_iconTop Results From Across the Web

CUDA initialization: The NVIDIA driver on your system is too ...
“The NVIDIA driver on your system is too old (found version 10010)” ... Is this the GPU driver (which has version 25.21.14.2531),
Read more >
pytorch code sudden fails on colab with NVIDIA driver on your ...
The NVIDIA driver on your system is too old (found version 10010). nvcc shows Cuda compilation tools, release 10.1, V10.1.243. I tried torch ......
Read more >
如何解决Pytorch的GPU driver is too old的问题? - 知乎
The NVIDIA driver on your system is too old (found version 8000). Please update your GPU driver by downloading and installing a new....
Read more >
【解决方法】The NVIDIA driver on your system is too old ...
RuntimeError: The NVIDIA driver on your system is too old (found version 10010). Please update your GPU driver by downloading and installing a ......
Read more >
CUDA version not available message with nvc++ on Ubuntu
Download and install the HPC SDK which includes the older CUDA versions. Update your CUDA driver to CUDA 11.0. Install the CUDA 10.2...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found