Pynvml issues when running LocalCudaCluster
See original GitHub issue$ conda create -n testenv -c rapidsai -c conda-forge python=3.7 ipython distributed==2.3.0 dask-cuda==0.9.0 pynvml
from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster()
distributed.nanny - ERROR - Failed to start worker
Traceback (most recent call last):
File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/nanny.py", line 674, in run
await worker
File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/worker.py", line 1016, in start
await self._register_with_scheduler()
File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/worker.py", line 811, in _register_with_scheduler
metrics=await self.get_metrics(),
File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/worker.py", line 740, in get_metrics
result = await result
File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/worker.py", line 3406, in gpu_metric
result = yield offload(nvml.real_time)
File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/utils.py", line 1489, in offload
return (yield _offload_executor.submit(fn, *args, **kwargs))
File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/diagnostics/nvml.py", line 11, in real_time
"utilization": [pynvml.nvmlDeviceGetUtilizationRates(h).gpu for h in handles],
File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/diagnostics/nvml.py", line 11, in <listcomp>
"utilization": [pynvml.nvmlDeviceGetUtilizationRates(h).gpu for h in handles],
File "/home/nfs/bzaitlen/GitRepos/pynvml/pynvml/nvml.py", line 1347, in nvmlDeviceGetUtilizationRates
check_return(ret)
File "/home/nfs/bzaitlen/GitRepos/pynvml/pynvml/nvml.py", line 366, in check_return
raise NVMLError(ret)
pynvml.nvml.NVMLError_Uninitialized: Uninitialized
pynvml.nvml.NVMLError_Uninitialized: Uninitialized
This is giving the same error that was raised in gpuopenanalytics/pynvml#16.
Note that if you do not install pynvml
into the environment this doesn’t happen.
Issue Analytics
- State:
- Created 4 years ago
- Comments:14 (13 by maintainers)
Top Results From Across the Web
NVML cannot load methods "NVMLError_FunctionNotFound"
I have a 1080Ti GPU with CUDA 10.2 , NVIDIA driver 440.59 and pynvml version 11.4.1 running on Ubuntu 16.04 .
Read more >dask-cuda - Python Package Health Analysis - Snyk
As a healthy sign for on-going project maintenance, we found that the GitHub repository had at least 1 pull request or issue interacted...
Read more >API — dask-cuda 22.12.00a0+g2c99f5a documentation
This assigns a different CUDA_VISIBLE_DEVICES environment variable to each Dask worker process. For machines with a complex architecture mapping CPUs, GPUs, and ...
Read more >Working with GPU - fastai
To see what other options you can query run: nvidia-smi --help-query-gpu . ... It relies on pynvml to talk to the nvml layer....
Read more >Using RAPIDS and DASK - Kaggle
Explore and run machine learning code with Kaggle Notebooks | Using ... h62408e4_0 796 KB libtiff-4.1.0 | h2733197_1 449 KB pynvml-8.0.4 ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I can continue to reproduce this issue with the conda environment above and distributed master by running:
@CallShaul I’m afraid RAPIDS does not support Windows.