Query on LocalCUDACluster usage
See original GitHub issueHi,
I want to create local cuda dask cluster using LocalCUDACluster
python script is mentioned below -
$ cat test_cluster.py
import os
from dask.distributed import Client
from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster(scheduler_port=12347,n_workers=2, threads_per_worker=1)
print("cluster status ",cluster.status)
print("cluster infomarion ", cluster)
client = Client(cluster)
print("client information ",client)
$
When I am using python command prompt - it is working for me .
$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:34:02)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>>
>>> from dask.distributed import Client
>>> from dask_cuda import LocalCUDACluster
>>>
>>> cluster = LocalCUDACluster(scheduler_port=12347,n_workers=2, threads_per_worker=1)
>>> print("cluster status ",cluster.status)
cluster status running
>>> print("cluster infomarion ", cluster)
cluster infomarion LocalCUDACluster('tcp://127.0.0.1:12347', workers=2, ncores=2)
>>> client = Client(cluster)
>>> print("client information ",client)
client information <Client: scheduler='tcp://127.0.0.1:12347' processes=2 cores=2>
>>>
However, when I am trying to run python scripts - using python test_cluster.py
$ python test_cluster.py
cluster status running
cluster infomarion LocalCUDACluster('tcp://127.0.0.1:12347', workers=0, ncores=0)
client information <Client: scheduler='tcp://127.0.0.1:12347' processes=0 cores=0>
Traceback (most recent call last):
File "/home/pradghos/anaconda3/lib/python3.6/multiprocessing/forkserver.py", line 196, in main
_serve_one(s, listener, alive_r, old_handlers)
File "/home/pradghos/anaconda3/lib/python3.6/multiprocessing/forkserver.py", line 231, in _serve_one
code = spawn._main(child_r)
File "/home/pradghos/anaconda3/lib/python3.6/multiprocessing/spawn.py", line 114, in _main
prepare(preparation_data)
File "/home/pradghos/anaconda3/lib/python3.6/multiprocessing/spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/home/pradghos/anaconda3/lib/python3.6/multiprocessing/spawn.py", line 277, in _fixup_main_from_path
run_name="__mp_main__")
File "/home/pradghos/anaconda3/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/home/pradghos/anaconda3/lib/python3.6/runpy.py", line 96, in _run_module_code
Traceback (most recent call last):
File "/home/pradghos/anaconda3/lib/python3.6/multiprocessing/forkserver.py", line 196, in main
_serve_one(s, listener, alive_r, old_handlers)
File "/home/pradghos/anaconda3/lib/python3.6/multiprocessing/forkserver.py", line 231, in _serve_one
mod_name, mod_spec, pkg_name, script_name)
....
....
File "/home/pradghos/anaconda3/lib/python3.6/site-packages/distributed/utils.py", line 316, in f
self.listener.start()
File "/home/pradghos/anaconda3/lib/python3.6/site-packages/distributed/comm/tcp.py", line 421, in start
result[0] = yield future
self.port, address=self.ip, backlog=backlog
File "/home/pradghos/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
File "/home/pradghos/anaconda3/lib/python3.6/site-packages/tornado/netutil.py", line 163, in bind_sockets
value = future.result()
File "/home/pradghos/anaconda3/lib/python3.6/site-packages/distributed/deploy/spec.py", line 158, in _start
self.scheduler = await self.scheduler
File "/home/pradghos/anaconda3/lib/python3.6/site-packages/distributed/scheduler.py", line 1239, in __await__
sock.bind(sockaddr)
OSError: [Errno 98] Address already in use
self.start()
File "/home/pradghos/anaconda3/lib/python3.6/site-packages/distributed/scheduler.py", line 1200, in start
self.listen(addr_or_port, listen_args=self.listen_args)
File "home/pradghos/anaconda3/lib/python3.6/site-packages/distributed/core.py", line 322, in listen
self.listener.start()
File "/home/pradghos/anaconda3/lib/python3.6/site-packages/distributed/comm/tcp.py", line 421, in start
self.port, address=self.ip, backlog=backlog
File "/home/pradghos/anaconda3/lib/python3.6/site-packages/tornado/netutil.py", line 163, in bind_sockets
sock.bind(sockaddr)
OSError: [Errno 98] Address already in use
distributed.nanny - WARNING - Worker process 24873 was killed by unknown signal
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker process 24874 was killed by unknown signal
distributed.nanny - WARNING - Restarting worker
Any pointers if I am missing something ? Thanks in advance !
Issue Analytics
- State:
- Created 4 years ago
- Comments:18 (9 by maintainers)
Top Results From Across the Web
API — dask-cuda 22.12.00a0+g2c99f5a documentation
LocalCluster that uses one GPU per process. This assigns a different CUDA_VISIBLE_DEVICES environment variable to each Dask worker process.
Read more >Quickstart — dask-sql documentation - Read the Docs
After Installation, you can start querying your data using SQL. ... a Dask-CUDA LocalCUDACluster ) can be deployed and a client connected to...
Read more >Dask Tutorial - Beginner's Guide to Distributed Computing ...
RAPIDS uses Dask to scale computations on NVIDIA GPUs to clusters of ... from dask_cuda import LocalCUDACluster from dask.distributed import ...
Read more >Bringing Dask Workloads to GPUs with RAPIDS - YouTube
This workshop introduces RAPIDS, and illustrates how to use Dask and ... scientists to easily make use of GPU acceleration in common ETL, ......
Read more >BlazingSQL review: Fast ETL for GPU-based data science
For distributed SQL query execution, BlazingSQL draws on Dask, ... For multiple GPUs in a single node, you need to use a LocalCUDACluster...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The issue with delay for workers to start and cluster to report them was also fixed in https://github.com/rapidsai/dask-cuda/pull/78.
My apologies the delay in responding here.
After analyzing this issue a little further, indeed
nvprof
doesn’t report anything on GPUs other than 0, but watchnvidia-smi
I can see that there is GPU utilization for all GPUs on the machine.Would you mind doing another test? What I suggest is that you run again your code and watch
nvidia-smi
during its execution. The differences I noticed were that ifimport cudf
/import dask_cudf
happen at the top, I will see all GPUs consuming 11MB and 0% utilization for the entire execution time, whereas GPU 0 reaches over 10GB of consumption and GPU utilization goes up to 100% at times. If I move the imports to after printing cluster information, immediately after that happens, I see GPUs consuming 429MB, with the exception of GPU 0 consuming > 4GB (as it’s populatingdf
), after some time I see memory and GPU utilization increasing on other GPUs, but they are really subtle as there’s not much computation going on.I believe the reason for
nvprof
not reporting utilization on all GPUs is due to dask-cuda usingCUDA_VISIBLE_DEVICES
environment variable to select which GPU is used by each process. Within the process, the GPU being utilized is seen as GPU 0 at all times, and I thinknvprof
is using that index during report. I can’t confirm if my assumption is correct yet, but I’ll try to make a simple example and perhaps file a bug report tonvprof
if necessary.