question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for nprocs?

See original GitHub issue

Is there was any interest in building in support for nprocs? I know in #84 the consensus was that having a 1-1 relationship between processes and pods makes the most sense.

We use nprocs because

  • Some workloads work better with processes vs threads.
  • We prefer to think in terms of machines rather than pods.

I’ve considered thinking in pods rather than machines but for the clusters we manage, machines are the fundamental unit people pay for, and it’s easy to end up in a situation where machines are under-utilized at the k8s level. Yes k8s can move pods around, but that ends up potentially disrupting longer running workloads.

For the most part using dask-kubernetes with nprocs>1 has worked pretty well. It can get a little goofy because if nprocs=4 and I call scale(4) I end up with 16 workers. I think the most value would be accomplished in making adaptive understand nprocs

So the question is just if anyone else cares about this? If it’s just me, I’ll subclass Adaptive and call it a day. Otherwise I can add this functionality into dask-kubernetes .

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
hhuuggoocommented, Jul 29, 2021

@jacobtomlinson I’m not sure #2 needs to be addressed.

The scheduler already understands hosts:

https://github.com/dask/distributed/blob/1be9265ac11876df766bb8bd6d6eb519d04d3bac/distributed/scheduler.py#L6398

and Adaptive supports configuring that parameter

https://github.com/dask/distributed/blob/1be9265ac11876df766bb8bd6d6eb519d04d3bac/distributed/deploy/adaptive.py#L93

I think we would only need to modify dask-kubernetes to configure Adaptive with the proper key?

0reactions
hhuuggoocommented, Aug 26, 2021

Just a note that I just started digging around, and I’m not sure this is an issue (was looking at 2021.07 earlier last week). I believe the recommendations I’m getting back for the scheduler are for whole pods, but I can confirm on this issue later on when I can dig deeper.

I do think there is an issue where while pods are starting, dask_kubernetes does not know that they are starting. I had a situation where the scheduler wanted to scale down to 1, and it resulted in all pods being shut down, except for one that was still in the process of starting up. When I confirm that, I will write it up as a separate issue, and possibly close this one.

Read more comments on GitHub >

github_iconTop Results From Across the Web

nproc(1) - Linux manual page - man7.org
nproc - print the number of processing units available ... GNU coreutils online help: <https://www.gnu.org/software/coreutils/> Report any translation bugs ...
Read more >
nproc Command in Linux with Examples - GeeksforGeeks
This command will display the help section of the nproc command which will have all the information related to the nproc command.
Read more >
Linux nproc Command Tutorial - Linux Hint
In Unix-like systems, the “nproc” command is a tool that is used to count the number of available processing units available to the...
Read more >
IBM MQ - Configuring and tuning the operating system on Linux
NOFILE and NPROC limits set using a pluggable security module are not ... Note: The --dbpath option is not supported when installing IBM...
Read more >
Setting nproc in /etc/security/limits.conf has no effect in Red ...
Current Customers and Partners · New to Red Hat? · Using a Red Hat product through a public cloud? · Quick Links ·...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found