no CUDA-capable device is detected
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): docker Ubuntu 16.04 image
- Ray installed from (source or binary): pip
- Ray version: 0.5.3
- Python version: Python 3.5.6 :: Anaconda, Inc.
- Exact command to reproduce:
Describe the problem
Trying to setup a rllib ppo agent with husky_env
from Gibson Env
The script I ran can be found here
I am getting the following Error when calling agent.train()
:
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=74 error=38 : no CUDA-capable device is detected
Gibson does the environment rendering upon environment creation, and rllib
agent’s seems to invoke env_creator
every time train()
is called. I originally thought that was the issue but I don’t think it is the case
I tried using gpu_fraction
, didn’t work. Not sure what is causing the problem.
nvidia-smi
root@e6b154065e88:~/mount/gibson/examples/train# nvidia-smi
Wed Nov 7 09:59:00 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.73 Driver Version: 410.73 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 00000000:04:00.0 On | N/A |
| 22% 42C P8 20W / 250W | 2385MiB / 12198MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
torch.cuda.device_count()
root@e6b154065e88:~# python -c "import torch
print(torch.cuda.device_count())
print(torch.cuda.current_device())"
1
0
nvcc --version
root@e6b154065e88:~# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
To Reproduce
Get Nvidia-Docker2
https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
#Ubuntu Installation
sudo apt-get install nvidia-docker2
sudo pkill -SIGHUP dockerd
Download Gibson’s dataset
wget https://storage.googleapis.com/gibsonassets/dataset.tar.gz
tar -zxf dataset.tar.gz
Pull Gibson’s image
docker pull xf1280/gibson:0.3.1
Run it in Docker
replace <dataset-absolute-path>
with the absolute path to the Gibson dataset you’ve unzipped on your local machine
docker run --runtime=nvidia -ti --name gibson -v <dataset-absolute-path>:/root/mount/gibson/gibson/assets/dataset -p 5001:5001 xf1280/gibson:0.3.1
Add in the ray_husky.py script
Copy the ray_husky.py
found here to ~/mount/gibson/examples/train/
directory in the docker container.
Run: python ray_husky.py
Full Log
root@e6b154065e88:~/mount/gibson/examples/train# python test.py
Unexpected end of /proc/mounts line `overlay / overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/4IFU7EUC3V2BOPDL2NFLW6T7BY:/var/lib/docker/overlay2/l/3GWVT6ULAU6NJP6MLTBNN56WBQ:/var/lib/docker/overlay2/l/CLLJDJFTZ2FMCKCN6B3WMCSXKG:/var/lib/docker/overlay2/l/QCO5RAE5DXB7MGGYLTK3YULY2O:/var/lib/docker/overlay2/l/NFJ7MEC3G7XLHLZMZWKKHLIM5Y:/var/lib/docker/overlay2/l/3LGFVLYHAWSN7GNAOYGCWVQK3Y:/var/lib/docker/overlay2/l/Q2BQDGXUX3SFP3RQYQDXOPWPSD:/var/lib/docker/overlay2/l/O5I6APSGOJZV4RFU7EOXVT5BWD:/var/lib/docker/overlay2/l/E4DOAELV7FPI6'
Unexpected end of /proc/mounts line `7XTB5ASEF7ESL:/var/lib/docker/overlay2/l/4BPII7VWNXTHZDYHMZQQ47WVGK:/var/lib/docker/overlay2/l/5RZ3I4FBOEGIAACNUMNPNJIIMM:/var/lib/docker/overlay2/l/JUDMTQV6ZO3CYJ64OCHUEOIDS4:/var/lib/docker/overlay2/l/WXFZP4STEX7JZ5S5VQCQR2MTDB:/var/lib/docker/overlay2/l/MUODDE6AS2PD6QOD6BXFE5JWN4:/var/lib/docker/overlay2/l/NV2EHBVA5EICRKTEGR3F4NADEC:/var/lib/docker/overlay2/l/MZVP7SBXRC7X7IKJKYHYQK6YOK:/var/lib/docker/overlay2/l/SVE4WWKXOSQOO2O3QQDMHW5TVB:/var/lib/docker/overlay2/l/NDRFI4BJ3ZGXEYSVAABQB6Z2OQ:/var/lib/do'
Unexpected end of /proc/mounts line `cker/overlay2/l/YTU432I3FDCY7GE4NT5VVR47GN:/var/lib/docker/overlay2/l/VCTBKUJHFQQQTCZRSPPZQKDIDZ:/var/lib/docker/overlay2/l/TR4DD4VR545GC7WIKUS5UDNRSM:/var/lib/docker/overlay2/l/BFRVMK6XAWSUK4JFRBYEOWQA4B:/var/lib/docker/overlay2/l/DLRGX3CDMNWDK66CSZZNXMTRTP:/var/lib/docker/overlay2/l/IPOZCPD7GVR3P3ECGOTQWPJ737:/var/lib/docker/overlay2/l/X6WEEMZQY3LGKMQELCNCCWVVHH:/var/lib/docker/overlay2/l/7APKFGZZGMNJ7BXSRL7A3WFVI6:/var/lib/docker/overlay2/l/PE6OSOUQSWBVJMTELFCNCFEG7X:/var/lib/docker/overlay2/l/FHHGDNFDT'
Unexpected end of /proc/mounts line `A32ESWYKQJTKH77LR:/var/lib/docker/overlay2/l/VEP2IVXB7LSMARPAJOF2SGEWTA:/var/lib/docker/overlay2/l/EAPK6KKCRU7YHHL6QVKDLQKSAH:/var/lib/docker/overlay2/l/5SZECZZ64ECDDARDWCQ2QOH2PY:/var/lib/docker/overlay2/l/XAL23ADNRDHSDATFJJSD3HA5T2:/var/lib/docker/overlay2/l/V7MN4H5N26LKKYRY4JGORHE4PI:/var/lib/docker/overlay2/l/3E3ILIVYCBQ52OYJLKCSZXAYPD:/var/lib/docker/overlay2/l/B4GW3N34A6DMEUWEO24TKYCJIW:/var/lib/docker/overlay2/l/XM3K5GW7VB5HRODVU7CTK5HUGD:/var/lib/docker/overlay2/l/7QHY2DH3GUNNMTOYULZIOK6F6O:/var/li'
pybullet build time: Sep 27 2018 00:17:23
pygame 1.9.4
Hello from the pygame community. https://www.pygame.org/contribute.html
Process STDOUT and STDERR is being redirected to /tmp/raylogs/.
Waiting for redis server at 127.0.0.1:46828 to respond...
Waiting for redis server at 127.0.0.1:15517 to respond...
Warning: Reducing object store memory because /dev/shm has only 67104768 bytes available. You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you may need to pass an argument with the flag '--shm-size' to 'docker run'.
Starting the Plasma object store with 0.00 GB memory.
Starting local scheduler with the following resources: {'CPU': 32, 'GPU': 1}.
Failed to start the UI, you may need to run 'pip install jupyter'.
Created LogSyncer for /root/ray_results/PPO_test_2018-11-07_09-49-37kxrhxuku -> None
/root/mount/gibson/examples/train/../configs/husky_navigate_rgb_train.yaml
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
Processing the data:
Total 1 scenes 0 train 1 test
Indexing
0%| | 0/1 [00:00<?, ?it/s]number of devices found 1
Loaded EGL 1.5 after reload.
GL_VENDOR=NVIDIA Corporation
GL_RENDERER=GeForce GTX TITAN X/PCIe/SSE2
GL_VERSION=4.6.0 NVIDIA 410.73
GL_SHADING_LANGUAGE_VERSION=4.60 NVIDIA
finish loading shaders
100%|#########################################################################################################################################################################| 1/1 [00:00<00:00, 1.99it/s]
9%|###############7 | 18/190 [00:01<02:14, 1.28it/s]terminate called after throwing an instance of 'zmq::error_t'
what(): Address already in use
100%|#####################################################################################################################################################################| 190/190 [00:12<00:00, 16.75it/s]
/root/mount/gibson/gibson/core/render/pcrender.py:204: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
self.imgv = Variable(torch.zeros(1, 3 , self.showsz, self.showsz), volatile = True).cuda()
/root/mount/gibson/gibson/core/render/pcrender.py:205: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
self.maskv = Variable(torch.zeros(1,2, self.showsz, self.showsz), volatile = True).cuda()
Episode: steps:0 score:0
Episode count: 0
/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/functional.py:995: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
Episode: steps:0 score:0
Episode count: 1
LocalMultiGPUOptimizer devices ['/gpu:0']
Unexpected end of /proc/mounts line `overlay / overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/4IFU7EUC3V2BOPDL2NFLW6T7BY:/var/lib/docker/overlay2/l/3GWVT6ULAU6NJP6MLTBNN56WBQ:/var/lib/docker/overlay2/l/CLLJDJFTZ2FMCKCN6B3WMCSXKG:/var/lib/docker/overlay2/l/QCO5RAE5DXB7MGGYLTK3YULY2O:/var/lib/docker/overlay2/l/NFJ7MEC3G7XLHLZMZWKKHLIM5Y:/var/lib/docker/overlay2/l/3LGFVLYHAWSN7GNAOYGCWVQK3Y:/var/lib/docker/overlay2/l/Q2BQDGXUX3SFP3RQYQDXOPWPSD:/var/lib/docker/overlay2/l/O5I6APSGOJZV4RFU7EOXVT5BWD:/var/lib/docker/overlay2/l/E4DOAELV7FPI6'
Unexpected end of /proc/mounts line `7XTB5ASEF7ESL:/var/lib/docker/overlay2/l/4BPII7VWNXTHZDYHMZQQ47WVGK:/var/lib/docker/overlay2/l/5RZ3I4FBOEGIAACNUMNPNJIIMM:/var/lib/docker/overlay2/l/JUDMTQV6ZO3CYJ64OCHUEOIDS4:/var/lib/docker/overlay2/l/WXFZP4STEX7JZ5S5VQCQR2MTDB:/var/lib/docker/overlay2/l/MUODDE6AS2PD6QOD6BXFE5JWN4:/var/lib/docker/overlay2/l/NV2EHBVA5EICRKTEGR3F4NADEC:/var/lib/docker/overlay2/l/MZVP7SBXRC7X7IKJKYHYQK6YOK:/var/lib/docker/overlay2/l/SVE4WWKXOSQOO2O3QQDMHW5TVB:/var/lib/docker/overlay2/l/NDRFI4BJ3ZGXEYSVAABQB6Z2OQ:/var/lib/do'
Unexpected end of /proc/mounts line `cker/overlay2/l/YTU432I3FDCY7GE4NT5VVR47GN:/var/lib/docker/overlay2/l/VCTBKUJHFQQQTCZRSPPZQKDIDZ:/var/lib/docker/overlay2/l/TR4DD4VR545GC7WIKUS5UDNRSM:/var/lib/docker/overlay2/l/BFRVMK6XAWSUK4JFRBYEOWQA4B:/var/lib/docker/overlay2/l/DLRGX3CDMNWDK66CSZZNXMTRTP:/var/lib/docker/overlay2/l/IPOZCPD7GVR3P3ECGOTQWPJ737:/var/lib/docker/overlay2/l/X6WEEMZQY3LGKMQELCNCCWVVHH:/var/lib/docker/overlay2/l/7APKFGZZGMNJ7BXSRL7A3WFVI6:/var/lib/docker/overlay2/l/PE6OSOUQSWBVJMTELFCNCFEG7X:/var/lib/docker/overlay2/l/FHHGDNFDT'
Unexpected end of /proc/mounts line `A32ESWYKQJTKH77LR:/var/lib/docker/overlay2/l/VEP2IVXB7LSMARPAJOF2SGEWTA:/var/lib/docker/overlay2/l/EAPK6KKCRU7YHHL6QVKDLQKSAH:/var/lib/docker/overlay2/l/5SZECZZ64ECDDARDWCQ2QOH2PY:/var/lib/docker/overlay2/l/XAL23ADNRDHSDATFJJSD3HA5T2:/var/lib/docker/overlay2/l/V7MN4H5N26LKKYRY4JGORHE4PI:/var/lib/docker/overlay2/l/3E3ILIVYCBQ52OYJLKCSZXAYPD:/var/lib/docker/overlay2/l/B4GW3N34A6DMEUWEO24TKYCJIW:/var/lib/docker/overlay2/l/XM3K5GW7VB5HRODVU7CTK5HUGD:/var/lib/docker/overlay2/l/7QHY2DH3GUNNMTOYULZIOK6F6O:/var/li'
pybullet build time: Sep 27 2018 00:17:23
pygame 1.9.4
Hello from the pygame community. https://www.pygame.org/contribute.html
/root/mount/gibson/examples/train/../configs/husky_navigate_rgb_train.yaml
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
Processing the data:
Total 1 scenes 0 train 1 test
Indexing
0%| | 0/1 [00:00<?, ?it/s]number of devices found 1
Loaded EGL 1.5 after reload.
GL_VENDOR=NVIDIA Corporation
GL_RENDERER=GeForce GTX TITAN X/PCIe/SSE2
GL_VERSION=4.6.0 NVIDIA 410.73
GL_SHADING_LANGUAGE_VERSION=4.60 NVIDIA
finish loading shaders
100%|#########################################################################################################################################################################| 1/1 [00:00<00:00, 1.74it/s]
11%|#################4 | 20/190 [00:02<00:47, 3.56it/s]terminate called after throwing an instance of 'zmq::error_t'
what(): Address already in use
100%|#####################################################################################################################################################################| 190/190 [00:12<00:00, 16.88it/s]
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=74 error=38 : no CUDA-capable device is detected
Remote function __init__ failed with:
Traceback (most recent call last):
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 945, in _process_task
*arguments)
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/actor.py", line 261, in actor_method_executor
method_returns = method(actor, *args)
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 178, in __init__
self.env = env_creator(env_context)
File "w.py", line 36, in <lambda>
register_env(env_name, lambda _ : getGibsonEnv())
File "w.py", line 29, in getGibsonEnv
config=config_file)
File "/root/mount/gibson/gibson/envs/husky_env.py", line 40, in __init__
self.robot_introduce(Husky(self.config, env=self))
File "/root/mount/gibson/gibson/envs/env_modalities.py", line 349, in robot_introduce
self.setup_rendering_camera()
File "/root/mount/gibson/gibson/envs/env_modalities.py", line 376, in setup_rendering_camera
self.setup_camera_pc()
File "/root/mount/gibson/gibson/envs/env_modalities.py", line 636, in setup_camera_pc
env = self)
File "/root/mount/gibson/gibson/core/render/pcrender.py", line 172, in __init__
comp = torch.nn.DataParallel(comp).cuda()
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 258, in cuda
return self._apply(lambda t: t.cuda(device))
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 191, in _apply
param.data = fn(param.data)
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 258, in <lambda>
return self._apply(lambda t: t.cuda(device))
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:74
Remote function set_global_vars failed with:
Traceback (most recent call last):
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 923, in _process_task
self.reraise_actor_init_error()
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 267, in reraise_actor_init_error
raise self.actor_init_error
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 945, in _process_task
*arguments)
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/actor.py", line 261, in actor_method_executor
method_returns = method(actor, *args)
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 178, in __init__
self.env = env_creator(env_context)
File "w.py", line 36, in <lambda>
register_env(env_name, lambda _ : getGibsonEnv())
File "w.py", line 29, in getGibsonEnv
config=config_file)
File "/root/mount/gibson/gibson/envs/husky_env.py", line 40, in __init__
self.robot_introduce(Husky(self.config, env=self))
File "/root/mount/gibson/gibson/envs/env_modalities.py", line 349, in robot_introduce
self.setup_rendering_camera()
File "/root/mount/gibson/gibson/envs/env_modalities.py", line 376, in setup_rendering_camera
self.setup_camera_pc()
File "/root/mount/gibson/gibson/envs/env_modalities.py", line 636, in setup_camera_pc
env = self)
File "/root/mount/gibson/gibson/core/render/pcrender.py", line 172, in __init__
comp = torch.nn.DataParallel(comp).cuda()
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 258, in cuda
return self._apply(lambda t: t.cuda(device))
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 191, in _apply
param.data = fn(param.data)
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 258, in <lambda>
return self._apply(lambda t: t.cuda(device))
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:74
killing <subprocess.Popen object at 0x7f97880d22b0>
File "w.py", line 68, in <module>
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/rllib/agents/agent.py", line 233, in train
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/rllib/utils/filter_manager.py", line 25, in synchronize
Remote function get_filters failed with:
Traceback (most recent call last):
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 923, in _process_task
self.reraise_actor_init_error()
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 267, in reraise_actor_init_error
raise self.actor_init_error
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 923, in _process_task
self.reraise_actor_init_error()
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 267, in reraise_actor_init_error
raise self.actor_init_error
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 945, in _process_task
*arguments)
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/actor.py", line 261, in actor_method_executor
method_returns = method(actor, *args)
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 178, in __init__
self.env = env_creator(env_context)
File "w.py", line 36, in <lambda>
register_env(env_name, lambda _ : getGibsonEnv())
File "w.py", line 29, in getGibsonEnv
config=config_file)
File "/root/mount/gibson/gibson/envs/husky_env.py", line 40, in __init__
self.robot_introduce(Husky(self.config, env=self))
File "/root/mount/gibson/gibson/envs/env_modalities.py", line 349, in robot_introduce
self.setup_rendering_camera()
File "/root/mount/gibson/gibson/envs/env_modalities.py", line 376, in setup_rendering_camera
self.setup_camera_pc()
File "/root/mount/gibson/gibson/envs/env_modalities.py", line 636, in setup_camera_pc
env = self)
File "/root/mount/gibson/gibson/core/render/pcrender.py", line 172, in __init__
comp = torch.nn.DataParallel(comp).cuda()
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 258, in cuda
return self._apply(lambda t: t.cuda(device))
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 191, in _apply
param.data = fn(param.data)
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 258, in <lambda>
return self._apply(lambda t: t.cuda(device))
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:74
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 2514, in get
RayGetError: Could not get objectid ObjectID(4a7d420ef7de86cb813dcb59e2ebc4ece375f9d7). It was created by remote function get_filters which failed with:
Remote function get_filters failed with:
Traceback (most recent call last):
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 923, in _process_task
self.reraise_actor_init_error()
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 267, in reraise_actor_init_error
raise self.actor_init_error
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 923, in _process_task
self.reraise_actor_init_error()
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 267, in reraise_actor_init_error
raise self.actor_init_error
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 945, in _process_task
*arguments)
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/actor.py", line 261, in actor_method_executor
method_returns = method(actor, *args)
File "/miniconda/envs/py35/lib/python3.5/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 178, in __init__
self.env = env_creator(env_context)
File "w.py", line 36, in <lambda>
register_env(env_name, lambda _ : getGibsonEnv())
File "w.py", line 29, in getGibsonEnv
config=config_file)
File "/root/mount/gibson/gibson/envs/husky_env.py", line 40, in __init__
self.robot_introduce(Husky(self.config, env=self))
File "/root/mount/gibson/gibson/envs/env_modalities.py", line 349, in robot_introduce
self.setup_rendering_camera()
File "/root/mount/gibson/gibson/envs/env_modalities.py", line 376, in setup_rendering_camera
self.setup_camera_pc()
File "/root/mount/gibson/gibson/envs/env_modalities.py", line 636, in setup_camera_pc
env = self)
File "/root/mount/gibson/gibson/core/render/pcrender.py", line 172, in __init__
comp = torch.nn.DataParallel(comp).cuda()
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 258, in cuda
return self._apply(lambda t: t.cuda(device))
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 191, in _apply
param.data = fn(param.data)
File "/miniconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 258, in <lambda>
return self._apply(lambda t: t.cuda(device))
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:74
I1107 09:50:22.214844 9899 local_scheduler.cc:178] Killed worker pid 13341 which hadn't started yet.
Issue Analytics
- State:
- Created 5 years ago
- Comments:25 (18 by maintainers)
Top GitHub Comments
Yes, so the issue was that
CUDA_VISIBLE_DEVICES
was being unset from the environment (somehow). Puttingos.environ('CUDA_VISIBLE_DEVICES') = '0'
fixed the issue. Thanks everyone!Closing this issue because it seems like this is working. Please reopen if not.