Configure Dask workers to contact scheduler on a specific address
See original GitHub issueAt CERN we have a Jupyter notebook service that we are now integrating with HTCondor resources, and we would like to use those resources via Dask.
The setup is the following: users log in to the notebook service and get a user session, which runs in a Docker container. Inside their session, users should be able to create a Dask HTCondorCluster to deploy Dask workers on our HTCondor pool. The problem we have is that the address that the scheduler binds to can’t be the same as the address workers use to contact the scheduler. The scheduler runs inside the container, and should listen on an address:port of the private network of the container. However, the workers (which are running in another network in the HTCondor pool) should contact the scheduler on an address:port of the node that hosts the user container, for which we would setup port forwarding to reach the container.
It looks like there currently no way for the workers to receive a different scheduler address than the address the scheduler binds to. We found https://github.com/dask/distributed/pull/2963, but that only allows to specify a different address for the client to contact the scheduler (i.e. the scheduler must still bind to the same address that the workers receive).
Would it be interesting to support a use case like the one I just described? How could it be implemented? Perhaps via a new option for the scheduler to specify what address should workers use to connect to it. The naming should be clear to avoid confusion with the already existing external_address
(added in https://github.com/dask/distributed/pull/2963).
Pinging @oshadura as she had a proposal for such a patch.
(Previously discussed in: https://dask.discourse.group/t/dask-scheduler-in-a-docker-container-workers-as-htcondor-jobs)
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (7 by maintainers)
Top GitHub Comments
That is an excellent question. If the future object on client side is lost, I don’t know what references keeps the Scheduler… Maybe this question could be asked on Discourse.
I’m looking at your PR right now.
Closing this one as an option has been added to distributed directly. @etejedor feel free to reopen if I missed something.