question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Configure Dask workers to contact scheduler on a specific address

See original GitHub issue

At CERN we have a Jupyter notebook service that we are now integrating with HTCondor resources, and we would like to use those resources via Dask.

The setup is the following: users log in to the notebook service and get a user session, which runs in a Docker container. Inside their session, users should be able to create a Dask HTCondorCluster to deploy Dask workers on our HTCondor pool. The problem we have is that the address that the scheduler binds to can’t be the same as the address workers use to contact the scheduler. The scheduler runs inside the container, and should listen on an address:port of the private network of the container. However, the workers (which are running in another network in the HTCondor pool) should contact the scheduler on an address:port of the node that hosts the user container, for which we would setup port forwarding to reach the container.

It looks like there currently no way for the workers to receive a different scheduler address than the address the scheduler binds to. We found https://github.com/dask/distributed/pull/2963, but that only allows to specify a different address for the client to contact the scheduler (i.e. the scheduler must still bind to the same address that the workers receive).

Would it be interesting to support a use case like the one I just described? How could it be implemented? Perhaps via a new option for the scheduler to specify what address should workers use to connect to it. The naming should be clear to avoid confusion with the already existing external_address (added in https://github.com/dask/distributed/pull/2963).

Pinging @oshadura as she had a proposal for such a patch.

(Previously discussed in: https://dask.discourse.group/t/dask-scheduler-in-a-docker-container-workers-as-htcondor-jobs)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
guillaumeebcommented, Mar 30, 2022

But even if I stored the address of the scheduler somewhere and I use it to reconnect with a fresh client, what I guess is not supported is to reconnect to some ongoing computation? E.g. I run client.submit, then the client dies but the job is still running on the cluster side; there’s no way I can recreate the future result when the client is back.

That is an excellent question. If the future object on client side is lost, I don’t know what references keeps the Scheduler… Maybe this question could be asked on Discourse.

I’m looking at your PR right now.

0reactions
guillaumeebcommented, Jul 10, 2022

Closing this one as an option has been added to distributed directly. @etejedor feel free to reopen if I missed something.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Configuration - Dask documentation
The address that the scheduler advertises to workers for communication with it. To be specified when the address to which the scheduler binds...
Read more >
Worker — Dask.distributed 2022.12.1 documentation
Compute tasks as directed by the scheduler. Store and serve computed results to other workers or clients. Each worker contains a ThreadPool that...
Read more >
Command Line - Dask documentation
The workers connect to the scheduler, which then sets up a long-running network connection back to the worker. The workers will learn the...
Read more >
Quickstart — Dask.distributed 2022.12.1 documentation
Setup Dask.distributed the Easy Way¶. If you create a client without providing an address it will start up a local scheduler and worker...
Read more >
Scheduling - Dask documentation
Dask is composed of three parts. "Collections" create "Task Graphs" which. For different computations you may find better performance with particular scheduler ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found