Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The NVIDIA driver on your system is too old (found version 10020).

See original GitHub issue

Not actually 100% sure that this is a dsub issue, but I’m trying to run a Docker image which is based on gcr.io/deeplearning-platform-release/pytorch-gpu.1-6:latest. When I execute python, I get the following error in dsub:

Failure message: Stopped running "user-command": exit status 1: /site-packages/torch/nn/modules/module.py", line 225, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 247, in _apply
    param_applied = fn(param)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 463, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
  File "/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 150, in _lazy_init
    _check_driver()
  File "/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 63, in _check_driver
    of the CUDA driver.""".format(str(torch._C._cuda_getDriverVersion())))
AssertionError: 
The NVIDIA driver on your system is too old (found version 10020).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

I believe this is mapped via dsub and so this isn’t something I can fix on my end. Is that accurate?

Issue Analytics

State:
Created 3 years ago
Comments:11 (5 by maintainers)

Top GitHub Comments

1reaction

mbookmancommented, Dec 4, 2020

Great!

And Tim noted in the gcp-life-sciences-discuss thread:

No, this isn’t a misunderstanding on your part. We encountered some stability issues when using cos-extensions and so have been waiting to migrate until they are resolved. In the meantime, the version used by the Life Sciences API sometimes lags behind the default COS version. Our release next week will bring it back in sync (450.51.06).

1reaction

mbookmancommented, Dec 2, 2020

Thanks @carbocation .

Doing a bit of further digging, it appears from the serial console when booting a Pipelines API VM that the API controller is explicitly installing version 440.64.00

[   48.011769] exec_start.sh[594]: + NVIDIA_DRIVER_VERSION=440.64.00
[   48.011797] exec_start.sh[594]: + NVIDIA_DRIVER_MD5SUM=
[   48.011826] exec_start.sh[594]: + NVIDIA_INSTALL_DIR_HOST=/var/lib/nvidia
[   48.011858] exec_start.sh[594]: + NVIDIA_INSTALL_DIR_CONTAINER=/usr/local/nvidia
[   48.011887] exec_start.sh[594]: + ROOT_MOUNT_DIR=/root
[   48.011915] exec_start.sh[594]: + CACHE_FILE=/usr/local/nvidia/.cache
[   48.013025] exec_start.sh[594]: + LOCK_FILE=/root/tmp/cos_gpu_installer_lock
[   48.013059] exec_start.sh[594]: + LOCK_FILE_FD=20
[   48.013088] exec_start.sh[594]: + set +x
[   48.018760] exec_start.sh[594]: [INFO    2020-12-02 22:18:26 UTC] PRELOAD: false
[   48.020060] exec_start.sh[594]: [INFO    2020-12-02 22:18:26 UTC] Running on COS build id 13310.1041.24
[   48.020772] exec_start.sh[594]: [INFO    2020-12-02 22:18:26 UTC] Data dependencies (e.g. kernel source) will be fetched from https://storage.googleapis.com/cos-tools/13310.1041.24

That seems inconsistent with what was had been posted to GCP Life Sciences Discuss here:

https://groups.google.com/g/gcp-life-sciences-discuss/c/DIhQdGhVZT4

but perhaps the specific planned rollout just hasn’t happened yet:

we are planning to deprecate the VirtualMachine.nvidia_driver_version field of the Cloud Life Sciences RunPipeline method shortly after the release of COS 85 to the stable track.

I’ll post a question back on that thread to get an update.

Top Results From Across the Web

CUDA initialization: The NVIDIA driver on your system is too ...

“The NVIDIA driver on your system is too old (found version 10010)” ... Is this the GPU driver (which has version 25.21.14.2531),

pytorch code sudden fails on colab with NVIDIA driver on your ...

The NVIDIA driver on your system is too old (found version 10010). nvcc shows Cuda compilation tools, release 10.1, V10.1.243. I tried torch ......

如何解决Pytorch的GPU driver is too old的问题? - 知乎

The NVIDIA driver on your system is too old (found version 8000). Please update your GPU driver by downloading and installing a new....

【解决方法】The NVIDIA driver on your system is too old ...

RuntimeError: The NVIDIA driver on your system is too old (found version 10010). Please update your GPU driver by downloading and installing a ......

CUDA version not available message with nvc++ on Ubuntu

Download and install the HPC SDK which includes the older CUDA versions. Update your CUDA driver to CUDA 11.0. Install the CUDA 10.2...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Failure message: wrapping host binaries: pulling image: retry budget exhausted (10 attempts): running ["docker" "pull" "bash"]: exit status 1 (standard error: "Error response from daemon: Get https://registry-1.docker.io/v2/

The NVIDIA driver on your system is too old (found version 10020).

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Failure message: wrapping host binaries: pulling image: retry budget exhausted (10 attempts): running ["docker" "pull" "bash"]: exit status 1 (standard error: "Error response from daemon: Get https://registry-1.docker.io/v2/

Timeout hit even though no timeout was specified