question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The GPU runtime I was assigned with card "A100-SXM4-40GB" can't be used with pytorch

See original GitHub issue

Describe the current behavior I’m paying for Google Colab Pro+ and the GPU runtime I’m on can’t run CUDA. According to nvidia-smi:

# nvidia-smi
Sun Sep 26 04:12:30 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A100-SXM4-40GB      Off  | 00000000:00:04.0 Off |                    0 |
| N/A   46C    P0    49W / 400W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

When I try to use pytorch/CUDA I get this error message:

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Describe the expected behavior Normally I get some kind of Tesla GPU and everything is fine. I guess this machine is misconfigured or shouldn’t be considered a GPU runtime based on current library requirements. Sucks, because I can’t do ANYTHING about it except disconnect and hope to get a different one in several hours or whenever the colab system decides to reassign me one.

What web browser you are using Chrome. Expect browser is irrelevant here.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:12 (3 by maintainers)

github_iconTop GitHub Comments

9reactions
usergeniccommented, Sep 26, 2021

Aha. This fixed my problem:

pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
1reaction
robinsloancommented, Oct 8, 2021

Ah, fabulous! Just a thought… I was surprised to discover there is (apparently?) no Colab blog; I think there ought to be! Both to provide an obvious/“discoverable” platform for technical announcements like this, AND for the broader opportunity to talk about your work, the stuff the Colab team is excited about, and even highlight interesting uses, interesting notebooks…!

(I have often said that I think Colab is the most legitimately futuristic thing going, so obviously I think this should be a whole wonderful magazine-like website, not just a blog… but I would settle for a blog… 😝)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Frequently Asked Questions — PyTorch 1.13 documentation
Frequently Asked Questions. My model reports “cuda runtime error(2): out of memory”. As the error message suggests, you have run out of memory...
Read more >
It seems Pytorch doesn't use GPU
It's replying true for torch.cuda.is_available() , but overall training speed and task manager's graph seems torch can't utilize GPU well.
Read more >
PyTorch cannot find GPU, 2021 version
Environment: Remote Linux with core version 5.8.0. I am not a super user. Python 3.8.6; CUDA Version: 11.1; GPU is RTX 3090 with...
Read more >
PyTorch doesn't free GPU's memory of it gets aborted due to ...
I noticed that 99% of the GPU RAM is still being used after and no process is listed by nvidia-smi after PyTorch aborts...
Read more >
How to specify GPU usage? - PyTorch Forums
I have 4 GPUs indexed as 0,1,2,3 I try this way: model = torch.nn.DataParallel(model, device_ids=[0,1]).cuda() But actual process use GPU ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found