Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scale by number of cores or amount of memory

See original GitHub issue

When creating a cluster object we currently scale by number of workers

cluster = KubeCluster()
cluster.scale(10)

Where 10 is the number of workers we want to have. However, it is common for users to think about clusters in terms of number of cores or amount of memory, rather than in terms of number of dask workers

cluster.scale(cores=100)
cluster.scale(memory='1 TB')

What is the best way to achieve this uniformly across the dask deployment projects? I currently see two approaches, though there are probably more that others might see.

Establish a convention where clusters define information about the workers they will produce, something like the following:
```
>>> cluster.worker_info
{'cores': 4, 'memory': '16 GB'}
```
Then the core Cluster.scale method would translate this into number of workers and then call the subclass’s scale method appropriately
Let the downstream classes handle this themselves, but ask them all to handle it uniformly. This places more burden onto downstream implementations, but also gives them more freedom to select worker types as they see fit based on their capabilities.

@guillaumeeb (who has shown interest in doing this work) @lesteve @jhamman from dask-jobqueue
@jcrist from dask-yarn
@jacobtomlinson from dask-kubernetes

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:9 (9 by maintainers)

Top GitHub Comments

2reactions

guillaumeebcommented, Sep 6, 2018

Just to mention that a first (simple) PR for this issues is available in #2209 in case people here missed it!

@dhirschfeld the concept of different pool seems interesting for scaling with specific worker profiles (GPU, big memory nodes…)

We could have something like this for dask-jobqueue

cluster.add_pool(processes=1, cores=1, memory='16GB', queue='qgpu', pool_name='GPU', walltime='02:00:00')
cluster.scale(10, pool='GPU')
cluster.scale(100) # default pool

Maybe this belongs more to #2118? But both issues are linked.

1reaction

guillaumeebcommented, Aug 30, 2018

@jacobtomlinson sorry about my remark, I did not understand what you meant! Totally agree with you.