question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scale by number of cores or amount of memory

See original GitHub issue

When creating a cluster object we currently scale by number of workers

cluster = KubeCluster()
cluster.scale(10)

Where 10 is the number of workers we want to have. However, it is common for users to think about clusters in terms of number of cores or amount of memory, rather than in terms of number of dask workers

cluster.scale(cores=100)
cluster.scale(memory='1 TB')

What is the best way to achieve this uniformly across the dask deployment projects? I currently see two approaches, though there are probably more that others might see.

  1. Establish a convention where clusters define information about the workers they will produce, something like the following:

    >>> cluster.worker_info
    {'cores': 4, 'memory': '16 GB'}
    

    Then the core Cluster.scale method would translate this into number of workers and then call the subclass’s scale method appropriately

  2. Let the downstream classes handle this themselves, but ask them all to handle it uniformly. This places more burden onto downstream implementations, but also gives them more freedom to select worker types as they see fit based on their capabilities.

cc

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:1
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
guillaumeebcommented, Sep 6, 2018

Just to mention that a first (simple) PR for this issues is available in #2209 in case people here missed it!

@dhirschfeld the concept of different pool seems interesting for scaling with specific worker profiles (GPU, big memory nodes…)

We could have something like this for dask-jobqueue

cluster.add_pool(processes=1, cores=1, memory='16GB', queue='qgpu', pool_name='GPU', walltime='02:00:00')
cluster.scale(10, pool='GPU')
cluster.scale(100) # default pool

Maybe this belongs more to #2118? But both issues are linked.

1reaction
guillaumeebcommented, Aug 30, 2018

@jacobtomlinson sorry about my remark, I did not understand what you meant! Totally agree with you.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Choosing the Number of Nodes, CPU-cores and GPUs
Choosing the Number of Nodes, CPU-cores and GPUs ... To find the optimal value of <T> one must conduct a scaling analysis where...
Read more >
CPU vs. RAM: Which is More Important for You? - CDW
Storage size is the amount of memory available on any stick RAM and ... This refers to the number of physical cores on...
Read more >
Memory and number of cores - HPC documentation
Memory allocation​​ That means that if the submitted job needs more memory per core than what is in average available on the node,...
Read more >
Is the cache size or number of cores more important when ...
Both cache size and core count are vitally important when weighing a computer's performance, but when you are dealing with a relatively low ......
Read more >
Scale disk, memory, and CPU - IBM Cloud Docs
The amount of memory you allocate to the deployment is split between the 3 ... Your deployment is then guaranteed the minimum number...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found