question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why is the walltime is set by default?

See original GitHub issue

I was wondering why the default walltime for workers is set to 30 minutes instead of None?

For example, see https://github.com/dask/dask-jobqueue/blob/1f9ae1ecc79a5b76930e56c8801f3b8ace659877/dask_jobqueue/jobqueue.yaml#L71

I kept having failed jobs for 2 days until I realized that it’s actually this library is setting a hard limit for my jobs, why? This was very hard to debug among all the other errors that I was trying to fix.

If you want your users to be aware of walltime, isn’t there a better solution than setting it to a random number? Can’t you make it a required variable? Can’t the default value be something like "not-set-by-user" and complain if it’s not set?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:15 (11 by maintainers)

github_iconTop GitHub Comments

3reactions
jacobtomlinsoncommented, Jun 28, 2021

I’m definitely in favour of documenting it.

I am also in favour of logging output to the user when we create cluster managers for them. In dask-cloudprovider we log some information about the cluster object when it gets instantiated and that information is also available at cluster.get_logs(). We log things like VM type, docker image, region, etc. I think other cluster managers should start implementing this too.

1reaction
guillaumeebcommented, Jul 3, 2021

OK, what you say makes complete sense!

So to fix this issues, we should both :

  • update documentation to inform about the default setting (and explaining it a bit),
  • log some output with the jobqueue cluster main settings by implementing the _log method of the Cluster object.
Read more comments on GitHub >

github_iconTop Results From Across the Web

TORQUE/PBS Config - Default Queue Settings - Adaptive
Queue Default Node and Walltime Attributes. To set a default of one node and 15 minutes of walltime for a particular queue, issue...
Read more >
Specifying job resources - VSC documentation
Specifying job resources¶. Resources are specified using the -l option. Typically, three resources will be specified: walltime. number of nodes and cores.
Read more >
Important custom configs for Torque/Maui
Torque do these via qmgr - for each queue do set queue short ... for each queue set queue short acl_groups += cmexp...
Read more >
is there a way to change the walltime for currently running ...
This is generally to avoid malicious users from requesting a small amount of time so they get queued quickly and then changing it...
Read more >
Biowulf User Guide - NIH HPC
Your /home, /data and shared space is set up exactly the same on Helix and ... the walltime set for the job will...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found