question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pynvml issues when running LocalCudaCluster

See original GitHub issue
$ conda create -n testenv -c rapidsai -c conda-forge python=3.7 ipython distributed==2.3.0 dask-cuda==0.9.0 pynvml
from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster()
distributed.nanny - ERROR - Failed to start worker
Traceback (most recent call last):
  File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/nanny.py", line 674, in run
    await worker
  File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/worker.py", line 1016, in start
    await self._register_with_scheduler()
  File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/worker.py", line 811, in _register_with_scheduler
    metrics=await self.get_metrics(),
  File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/worker.py", line 740, in get_metrics
    result = await result
  File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/worker.py", line 3406, in gpu_metric
    result = yield offload(nvml.real_time)
  File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/utils.py", line 1489, in offload
    return (yield _offload_executor.submit(fn, *args, **kwargs))
  File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/diagnostics/nvml.py", line 11, in real_time
    "utilization": [pynvml.nvmlDeviceGetUtilizationRates(h).gpu for h in handles],
  File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/diagnostics/nvml.py", line 11, in <listcomp>
    "utilization": [pynvml.nvmlDeviceGetUtilizationRates(h).gpu for h in handles],
  File "/home/nfs/bzaitlen/GitRepos/pynvml/pynvml/nvml.py", line 1347, in nvmlDeviceGetUtilizationRates
    check_return(ret)
  File "/home/nfs/bzaitlen/GitRepos/pynvml/pynvml/nvml.py", line 366, in check_return
    raise NVMLError(ret)
pynvml.nvml.NVMLError_Uninitialized: Uninitialized


pynvml.nvml.NVMLError_Uninitialized: Uninitialized

This is giving the same error that was raised in gpuopenanalytics/pynvml#16.

Note that if you do not install pynvml into the environment this doesn’t happen.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:14 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
jacobtomlinsoncommented, Aug 23, 2019

I can continue to reproduce this issue with the conda environment above and distributed master by running:

from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster()
assert 'gpu' in cluster.scheduler.workers[cluster.scheduler.workers.keys()[0]].metrics
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-11-2b9d4ef220e7> in <module>
----> 1 assert 'gpu' in cluster.scheduler.workers[cluster.scheduler.workers.keys()[0]].metrics

AssertionError: 
0reactions
jacobtomlinsoncommented, Dec 14, 2020

@CallShaul I’m afraid RAPIDS does not support Windows.

Read more comments on GitHub >

github_iconTop Results From Across the Web

NVML cannot load methods "NVMLError_FunctionNotFound"
I have a 1080Ti GPU with CUDA 10.2 , NVIDIA driver 440.59 and pynvml version 11.4.1 running on Ubuntu 16.04 .
Read more >
dask-cuda - Python Package Health Analysis - Snyk
As a healthy sign for on-going project maintenance, we found that the GitHub repository had at least 1 pull request or issue interacted...
Read more >
API — dask-cuda 22.12.00a0+g2c99f5a documentation
This assigns a different CUDA_VISIBLE_DEVICES environment variable to each Dask worker process. For machines with a complex architecture mapping CPUs, GPUs, and ...
Read more >
Working with GPU - fastai
To see what other options you can query run: nvidia-smi --help-query-gpu . ... It relies on pynvml to talk to the nvml layer....
Read more >
Using RAPIDS and DASK - Kaggle
Explore and run machine learning code with Kaggle Notebooks | Using ... h62408e4_0 796 KB libtiff-4.1.0 | h2733197_1 449 KB pynvml-8.0.4 ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found