how to use ucx protocol for the communication between workers and schedulers
See original GitHub issueIt seems that the dask.distributed has supported the ucx protocol for the communications between workers and schedulers, which seems to have large advantages over tcp when equipped with infiniband. How can I use that with jobqueue? It seems not a hard thing because jobqueue is based on dask.distributed. If I add --protocol ucx
option for scheduler and worker command, would that be ok ?
Issue Analytics
- State:
- Created 4 years ago
- Comments:14 (13 by maintainers)
Top Results From Across the Web
Enabling UCX communication — dask-cuda ... - RAPIDS Docs
We communicate to the scheduler that we will be using UCX with the --protocol option, and that we will be using InfiniBand with...
Read more >High-Performance Python Communication with UCX-Py
Specify the UCX protocol;; Specify the transport we want to use. TCP over UCX. The simplest use of the UCX protocol in Dask...
Read more >UCX Integration - Dask-CUDA - Read the Docs
Dask-CUDA workers using UCX communication can be started manually with the dask-cuda-worker CLI tool or automatically with LocalCUDACluster .
Read more >Unified Communication X (UCX) - GitHub
ucp_stream_worker_poll(ucp_worker_h worker, ucp_stream_poll_ep_t ∗poll_←↪ ... level API that enables the implementation of communication protocols.
Read more >Experiments in High Performance Networking with UCX and ...
We then extended Dask communications to optionally use UCX. ... dask-scheduler --protocol ucx Scheduler started at ucx://127.0.0.1:8786 ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@mrocklin, could you point me to the setup you used on Cheyenne/Casper? I’ve been trying to launch a dask cluster with ucx protocol for communication. All my attempts have failed
Running the following
Results in a timeout error.
I tried launching the scheduler from the command line, and I ran into a different error:
Am I making a trivial error, or do I need to do some extra setup for things to work properly?
Ccing @quasiben in case he has some suggestions, too.
I should add here that we also tested this a few months ago and found it to give no performance benefit (at least in our use case). We also found that it kills resilience, though this may have since changed.