Scale by number of cores or amount of memory
See original GitHub issueWhen creating a cluster object we currently scale by number of workers
cluster = KubeCluster()
cluster.scale(10)
Where 10
is the number of workers we want to have. However, it is common for users to think about clusters in terms of number of cores or amount of memory, rather than in terms of number of dask workers
cluster.scale(cores=100)
cluster.scale(memory='1 TB')
What is the best way to achieve this uniformly across the dask deployment projects? I currently see two approaches, though there are probably more that others might see.
-
Establish a convention where clusters define information about the workers they will produce, something like the following:
>>> cluster.worker_info {'cores': 4, 'memory': '16 GB'}
Then the core
Cluster.scale
method would translate this into number of workers and then call the subclass’sscale
method appropriately -
Let the downstream classes handle this themselves, but ask them all to handle it uniformly. This places more burden onto downstream implementations, but also gives them more freedom to select worker types as they see fit based on their capabilities.
cc
- @guillaumeeb (who has shown interest in doing this work) @lesteve @jhamman from
dask-jobqueue
- @jcrist from
dask-yarn
- @jacobtomlinson from
dask-kubernetes
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:9 (9 by maintainers)
Top GitHub Comments
Just to mention that a first (simple) PR for this issues is available in #2209 in case people here missed it!
@dhirschfeld the concept of different pool seems interesting for scaling with specific worker profiles (GPU, big memory nodes…)
We could have something like this for dask-jobqueue
Maybe this belongs more to #2118? But both issues are linked.
@jacobtomlinson sorry about my remark, I did not understand what you meant! Totally agree with you.