question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ray is not finding GPU but TF, PyTorch and nvcc does

See original GitHub issue

I have two NVIDIA TitanX but Ray isn’t seeing any:

ray.init(num_gpus=2)
print(ray.get_gpu_ids())
# prints []

Ray also prints below inicating no GPUs:

2019-10-16 18:20:17,954 INFO multi_gpu_optimizer.py:93 -- LocalMultiGPUOptimizer devices ['/cpu:0']

But TensorFlow sees all devices:

import tensorflow
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

That prints:

[name: "/device:CPU:0"
device_type: "CPU"
...
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
...
, name: "/device:GPU:0"
device_type: "GPU"
...
, name: "/device:GPU:1"
device_type: "GPU"
...
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
...
, name: "/device:XLA_GPU:1"
device_type: "XLA_GPU"
...
]

Similarly,

/usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Why Ray doesn’t see my GPUs?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:8
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
SamShowaltercommented, Jul 24, 2021

I am having the same issue as @Wormh0-le. This is preventing me from training a torch policy without ray.tune which I do not which to use. I just want to call .train() on my agent.

1reaction
Wormh0-lecommented, Jun 29, 2021

Thanks, that was helpful although its confusing. This is what happens:

Even if I explicitly init ray with num_gpus=1, ray.get_gpu_ids() is [].

However, if I start PPOTrainer with explicit num_gpus=1 then ray gets GPU. If I don’t set this in config then it doesn’t.

I believe the confusing part is ray.get_gpu_ids() which I thought is the detected GPUs in the system. Instead, it’s actually allocated gpus in the system. I think there should be a method, may be, detected_gpus() so one can test that ray indeed sees GPUs and things are good to go. It would also be great if Ray just allocated GPUs automatically to itself (which should be good perhaps 99% of the times) so we don’t have to worry about this additional config.

and I explicit num_gpus=1,but ray still can’t get GPU, and torch.cuda.is_available() is True. why?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Ray Train doesn't detect GPU
Hi, I'm using Ray Train to train a PyTorch model on an EC2 g4dn.12xlarge (4*NVIDIA T4) ... GradScaler is enabled, but CUDA is...
Read more >
pytorch - GPU available in Tensorflow but not in Torch
I am attaching the specificities of the GPUs and the current version of Tensorflow and Pytorch I am using. Does anyone have any...
Read more >
PyTorch cannot find GPU, 2021 version
Environment: Remote Linux with core version 5.8.0. I am not a super user. Python 3.8.6; CUDA Version: 11.1; GPU is RTX 3090 with...
Read more >
Getting the Most Out of the NVIDIA A100 GPU with Multi ...
With MIG, each A100 GPU can be partitioned up to seven GPU instances, ... MIG does not allow GPU instances to be created...
Read more >
How To Install CUDA 10 (together with 9.2) on Ubuntu 18.04 ...
NVIDIA recently released version 10.0 of CUDA. This is an upgrade from the 9.x series and has support for the new Turing GPU...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found