Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The GPU runtime I was assigned with card "A100-SXM4-40GB" can't be used with pytorch

See original GitHub issue

Describe the current behavior I’m paying for Google Colab Pro+ and the GPU runtime I’m on can’t run CUDA. According to nvidia-smi:

# nvidia-smi
Sun Sep 26 04:12:30 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A100-SXM4-40GB      Off  | 00000000:00:04.0 Off |                    0 |
| N/A   46C    P0    49W / 400W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

When I try to use pytorch/CUDA I get this error message:

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Describe the expected behavior Normally I get some kind of Tesla GPU and everything is fine. I guess this machine is misconfigured or shouldn’t be considered a GPU runtime based on current library requirements. Sucks, because I can’t do ANYTHING about it except disconnect and hope to get a different one in several hours or whenever the colab system decides to reassign me one.

What web browser you are using Chrome. Expect browser is irrelevant here.

Issue Analytics

State:
Created 2 years ago
Comments:12 (3 by maintainers)

Top GitHub Comments

9reactions

usergeniccommented, Sep 26, 2021

Aha. This fixed my problem:

pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

1reaction

robinsloancommented, Oct 8, 2021

Ah, fabulous! Just a thought… I was surprised to discover there is (apparently?) no Colab blog; I think there ought to be! Both to provide an obvious/“discoverable” platform for technical announcements like this, AND for the broader opportunity to talk about your work, the stuff the Colab team is excited about, and even highlight interesting uses, interesting notebooks…!

(I have often said that I think Colab is the most legitimately futuristic thing going, so obviously I think this should be a whole wonderful magazine-like website, not just a blog… but I would settle for a blog… 😝)