Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Query on LocalCUDACluster usage

See original GitHub issue

Hi,

I want to create local cuda dask cluster using LocalCUDACluster

python script is mentioned below -

$ cat test_cluster.py
import os

from dask.distributed import Client
from dask_cuda import LocalCUDACluster

cluster = LocalCUDACluster(scheduler_port=12347,n_workers=2, threads_per_worker=1)

print("cluster status ",cluster.status)
print("cluster infomarion ", cluster)
client = Client(cluster)

print("client information ",client)

$

When I am using python command prompt - it is working for me .

$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:34:02)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>>
>>> from dask.distributed import Client
>>> from dask_cuda import LocalCUDACluster

>>>
>>> cluster = LocalCUDACluster(scheduler_port=12347,n_workers=2, threads_per_worker=1)
>>> print("cluster status ",cluster.status)
cluster status  running
>>> print("cluster infomarion ", cluster)
cluster infomarion  LocalCUDACluster('tcp://127.0.0.1:12347', workers=2, ncores=2)
>>> client = Client(cluster)
>>> print("client information ",client)
client information  <Client: scheduler='tcp://127.0.0.1:12347' processes=2 cores=2>
>>>

However, when I am trying to run python scripts - using python test_cluster.py

$ python test_cluster.py
cluster status  running
cluster infomarion  LocalCUDACluster('tcp://127.0.0.1:12347', workers=0, ncores=0)
client information  <Client: scheduler='tcp://127.0.0.1:12347' processes=0 cores=0>
Traceback (most recent call last):
  File "/home/pradghos/anaconda3/lib/python3.6/multiprocessing/forkserver.py", line 196, in main
    _serve_one(s, listener, alive_r, old_handlers)
  File "/home/pradghos/anaconda3/lib/python3.6/multiprocessing/forkserver.py", line 231, in _serve_one
    code = spawn._main(child_r)
  File "/home/pradghos/anaconda3/lib/python3.6/multiprocessing/spawn.py", line 114, in _main
    prepare(preparation_data)
  File "/home/pradghos/anaconda3/lib/python3.6/multiprocessing/spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/home/pradghos/anaconda3/lib/python3.6/multiprocessing/spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "/home/pradghos/anaconda3/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/home/pradghos/anaconda3/lib/python3.6/runpy.py", line 96, in _run_module_code
Traceback (most recent call last):
  File "/home/pradghos/anaconda3/lib/python3.6/multiprocessing/forkserver.py", line 196, in main
    _serve_one(s, listener, alive_r, old_handlers)
  File "/home/pradghos/anaconda3/lib/python3.6/multiprocessing/forkserver.py", line 231, in _serve_one
    mod_name, mod_spec, pkg_name, script_name)
  ....
  ....
  File "/home/pradghos/anaconda3/lib/python3.6/site-packages/distributed/utils.py", line 316, in f
    self.listener.start()
  File "/home/pradghos/anaconda3/lib/python3.6/site-packages/distributed/comm/tcp.py", line 421, in start
    result[0] = yield future
    self.port, address=self.ip, backlog=backlog
  File "/home/pradghos/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
  File "/home/pradghos/anaconda3/lib/python3.6/site-packages/tornado/netutil.py", line 163, in bind_sockets
    value = future.result()
  File "/home/pradghos/anaconda3/lib/python3.6/site-packages/distributed/deploy/spec.py", line 158, in _start
    self.scheduler = await self.scheduler
  File "/home/pradghos/anaconda3/lib/python3.6/site-packages/distributed/scheduler.py", line 1239, in __await__
    sock.bind(sockaddr)
OSError: [Errno 98] Address already in use
    self.start()
  File "/home/pradghos/anaconda3/lib/python3.6/site-packages/distributed/scheduler.py", line 1200, in start
    self.listen(addr_or_port, listen_args=self.listen_args)
  File "home/pradghos/anaconda3/lib/python3.6/site-packages/distributed/core.py", line 322, in listen
    self.listener.start()
  File "/home/pradghos/anaconda3/lib/python3.6/site-packages/distributed/comm/tcp.py", line 421, in start
    self.port, address=self.ip, backlog=backlog
  File "/home/pradghos/anaconda3/lib/python3.6/site-packages/tornado/netutil.py", line 163, in bind_sockets
    sock.bind(sockaddr)
OSError: [Errno 98] Address already in use
distributed.nanny - WARNING - Worker process 24873 was killed by unknown signal
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker process 24874 was killed by unknown signal
distributed.nanny - WARNING - Restarting worker

Any pointers if I am missing something ? Thanks in advance !

Issue Analytics

State:
Created 4 years ago
Comments:18 (9 by maintainers)

Top GitHub Comments

1reaction

pentschevcommented, Jun 27, 2019

The issue with delay for workers to start and cluster to report them was also fixed in https://github.com/rapidsai/dask-cuda/pull/78.

1reaction

pentschevcommented, Jun 21, 2019

My apologies the delay in responding here.

After analyzing this issue a little further, indeed nvprof doesn’t report anything on GPUs other than 0, but watch nvidia-smi I can see that there is GPU utilization for all GPUs on the machine.

Would you mind doing another test? What I suggest is that you run again your code and watch nvidia-smi during its execution. The differences I noticed were that if import cudf/import dask_cudf happen at the top, I will see all GPUs consuming 11MB and 0% utilization for the entire execution time, whereas GPU 0 reaches over 10GB of consumption and GPU utilization goes up to 100% at times. If I move the imports to after printing cluster information, immediately after that happens, I see GPUs consuming 429MB, with the exception of GPU 0 consuming > 4GB (as it’s populating df), after some time I see memory and GPU utilization increasing on other GPUs, but they are really subtle as there’s not much computation going on.

I believe the reason for nvprof not reporting utilization on all GPUs is due to dask-cuda using CUDA_VISIBLE_DEVICES environment variable to select which GPU is used by each process. Within the process, the GPU being utilized is seen as GPU 0 at all times, and I think nvprof is using that index during report. I can’t confirm if my assumption is correct yet, but I’ll try to make a simple example and perhaps file a bug report to nvprof if necessary.