question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow to start a Scheduler in a batch job

See original GitHub issue

One of the goal of ClusterManager object is to be able to launch a remote scheduler. In dask-jobqueue scope, this probably means submitting a job which will start a Scheduler, and then connect to it.

We probably still lacks some remote interface between ClusterManager and scheduler object for this to work, so it will probably mean to extend APIs upstream.

Identified Scheduler method to provide:

  • retire_workers(n, minimum, key)
  • scheduler_info(), already existing, see if sufficient,
  • add_diagnostic_plugin(), and mostly retrive plugin information remotely

I suspect that adaptive will need to change significantly too, this will maybe lead to having a transitional adaptive logic in dask-jobqueue, and other remote function to add in scheduler.

This is in the scope of #170.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:13 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
manuel-rhdtcommented, Dec 21, 2019

In practice I doubt that the scheduler will be expensive enough that system administrators will care. They all ask about this, but I don’t think that it will be important in reality.

Another reason to support this is for networking rules. In some systems (10-20%?) compute nodes are unable to connect back to login nodes. So here placing the scheduler on a compute node, and then connecting to that node from the client/login node is nice.

It may be though that this is a frequently requested but not actually useful feature.

I am currently using dask-joblib on a PBS cluster and running the scheduler on the login node. It is indeed a bit problematic because the login node has only 2gb of memory and it quickly runs out if I am not careful with the size of computation graphs.

So I think I would definitely benefit from this feature.

0reactions
lestevecommented, Mar 25, 2020

@muammar I see that you have commented in https://github.com/dask/dask-jobqueue/pull/390#issuecomment-603558844. Could you please explain the admin rules that are in place on your cluster just to get an idea what you are allowed to do on your cluster.

You may be interested by my answer above: https://github.com/dask/dask-jobqueue/issues/186#issuecomment-568265386. Let me try to some up:

  1. launch the scheduler (i.e. create the Cluster object) in an interactive job: probably easier. If you like to work in a Jupyter environment, this is doable this way. There are a few hoops to jump through (mostly SSH port forwarding to open your Jupyter notebook in your local browser at localhost:<some-port>).
  2. launch the scheduler (i.e. create the Cluster object) in a batch job. Only Python scripts not Jupyter environment.
  3. Starting the Dask worker by launching Dask commands yourself: https://docs.dask.org/en/latest/setup/hpc.html#using-a-shared-network-file-system-and-a-job-scheduler

In both 1. and 2. you need to bear in mind that as soon as your scheduler job finishes, you will lose all your workers after ~60s. That may mean losing the result of lenghty computations.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Trigger and Stop a Scheduled Spring Batch Job
Firstly, we have a class SpringBatchScheduler to configure scheduling and batch job. A method launchJob() will be registered as a scheduled task ...
Read more >
How to schedule a Batch File to run automatically on Windows ...
Create a Batch file; Open Task Scheduler; Create a Basic Task; Open Task Scheduler Library; Make Task runs with the highest privileges. Step...
Read more >
Spring Batch Scheduling: A Comprehensive Guide 101 - Learn
Use a class SpringBatchScheduler to configure the scheduling of Spring Batch Jobs. A method called launchJob() will be registered as a scheduled ......
Read more >
Schedule the Batch Job to Run - Salesforce Help
Select Schedule-Triggered Flow, and click Next. Drag the Action element onto the canvas. Select the Process Closed Cases batch job. Name the action...
Read more >
Job scheduling - AWS Batch
You can set scheduling priority to configure the order that jobs are run in on a share identifier. Jobs with a higher scheduling...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found