Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pynvml issues when running LocalCudaCluster

See original GitHub issue

$ conda create -n testenv -c rapidsai -c conda-forge python=3.7 ipython distributed==2.3.0 dask-cuda==0.9.0 pynvml

from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster()

distributed.nanny - ERROR - Failed to start worker
Traceback (most recent call last):
  File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/nanny.py", line 674, in run
    await worker
  File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/worker.py", line 1016, in start
    await self._register_with_scheduler()
  File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/worker.py", line 811, in _register_with_scheduler
    metrics=await self.get_metrics(),
  File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/worker.py", line 740, in get_metrics
    result = await result
  File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/worker.py", line 3406, in gpu_metric
    result = yield offload(nvml.real_time)
  File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/utils.py", line 1489, in offload
    return (yield _offload_executor.submit(fn, *args, **kwargs))
  File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/nfs/bzaitlen/miniconda3/envs/cudf-dev/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/diagnostics/nvml.py", line 11, in real_time
    "utilization": [pynvml.nvmlDeviceGetUtilizationRates(h).gpu for h in handles],
  File "/home/nfs/bzaitlen/GitRepos/distributed/distributed/diagnostics/nvml.py", line 11, in <listcomp>
    "utilization": [pynvml.nvmlDeviceGetUtilizationRates(h).gpu for h in handles],
  File "/home/nfs/bzaitlen/GitRepos/pynvml/pynvml/nvml.py", line 1347, in nvmlDeviceGetUtilizationRates
    check_return(ret)
  File "/home/nfs/bzaitlen/GitRepos/pynvml/pynvml/nvml.py", line 366, in check_return
    raise NVMLError(ret)
pynvml.nvml.NVMLError_Uninitialized: Uninitialized


pynvml.nvml.NVMLError_Uninitialized: Uninitialized

This is giving the same error that was raised in gpuopenanalytics/pynvml#16.

Note that if you do not install pynvml into the environment this doesn’t happen.

Issue Analytics

State:
Created 4 years ago
Comments:14 (13 by maintainers)

Top GitHub Comments

1reaction

jacobtomlinsoncommented, Aug 23, 2019

I can continue to reproduce this issue with the conda environment above and distributed master by running:

from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster()
assert 'gpu' in cluster.scheduler.workers[cluster.scheduler.workers.keys()[0]].metrics

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-11-2b9d4ef220e7> in <module>
----> 1 assert 'gpu' in cluster.scheduler.workers[cluster.scheduler.workers.keys()[0]].metrics

AssertionError:

0reactions

jacobtomlinsoncommented, Dec 14, 2020

@CallShaul I’m afraid RAPIDS does not support Windows.

Top Results From Across the Web

NVML cannot load methods "NVMLError_FunctionNotFound"

I have a 1080Ti GPU with CUDA 10.2 , NVIDIA driver 440.59 and pynvml version 11.4.1 running on Ubuntu 16.04 .

dask-cuda - Python Package Health Analysis - Snyk

As a healthy sign for on-going project maintenance, we found that the GitHub repository had at least 1 pull request or issue interacted...

API — dask-cuda 22.12.00a0+g2c99f5a documentation

This assigns a different CUDA_VISIBLE_DEVICES environment variable to each Dask worker process. For machines with a complex architecture mapping CPUs, GPUs, and ...

Working with GPU - fastai

To see what other options you can query run: nvidia-smi --help-query-gpu . ... It relies on pynvml to talk to the nvml layer....

Using RAPIDS and DASK - Kaggle

Explore and run machine learning code with Kaggle Notebooks | Using ... h62408e4_0 796 KB libtiff-4.1.0 | h2733197_1 449 KB pynvml-8.0.4 ...