question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support Multiple Worker Types

See original GitHub issue

As a machine learning engineer I would like to specify the resources for specific operations just like this example in the dask tutorial.

I am pretty new to dask, kubernetes, and helm, but have had success deploying with helm on GKE. And, I think I have seen enough examples of using nodeSelector for gpu node pools, gcs’s daemonset for driver installs, etc. that I could get them up and running easily. However, the machines that have accelerators are often limited or we need to scale cpu and gpu independently.

Specifically, I wish to basically run the example in the link:

with dask.annotate(resources={'GPU': 1}):
    processed = [client.submit(process, d) for d in data]
with dask.annotate(resources={'MEMORY': 70e9}):
    final = client.submit(aggregate, processed)

From what I gather, this is not possible as is with a kubernetes-based deployment using HelmCluster or KubeCluster. I am also noticing that .scale is not designed for multiple worker types, which makes me think this might be a heavier lift than I think and looks like this issue might actually belong in dask-distributed.

https://github.com/dask/dask-kubernetes/blob/ce9a2d28d598a5ea213584fa18e7e728b9dfbc9e/dask_kubernetes/helm.py#L258

I am seeing the blocker from 2018 here: https://github.com/dask/distributed/issues/2118

I have been able to manually (with kubectl) add a pod with the worker-spec.yaml example here and the scheduler shows it is available where cluster.scale() has no effect, but I honestly have no clue how to build a deployment to control that pod spec even if this hack worked.

note: it looks like ray is doing this here

Anyways, a few questions:

  1. Is it possible to currently support multiple worker types? (in case I missed something)
  2. Is there a deployment hack to take a worker-spec.yaml and deploy an auxiliary set of these workers? i.e. an intermediate step (without support by dask-distributed) to easily supplement an existing worker set with highmem or gpus, etc.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
jacobtomlinsoncommented, Feb 15, 2022

Yeah I’m going to close this out now. We have a blog post in the pipeline to demo this functionality dask/dask-blog#130

1reaction
jacobtomlinsoncommented, Jan 18, 2022

Thanks for raising this @ljstrnadiii. Excited to see users asking for this.

Scaling multiple worker groups is not well supported by any of the helper tools in Dask today, including dask-kubernetes. The annotations support is still relatively new.

We intend to address this as part of #256.

The only workaround I can suggest today is that you install the Dask Helm Chart and then manually create a second deployment of workers with GPUs and handle the scaling yourself.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Support multiple worker 'types' · Issue #90 - GitHub
Add support for multiple worker types. For example, a high memory worker and a 'default' worker. A couple of ways this could be...
Read more >
Worker Types | JATOS
Different worker types access a study in different ways. For example, some workers can run the same study multiple times, whereas others can...
Read more >
Design — Gunicorn 20.1.0 documentation
The asynchronous workers available are based on Greenlets (via Eventlet and Gevent). Greenlets are an implementation of cooperative multi-threading for Python.
Read more >
Multiple Worker - an overview | ScienceDirect Topics
OpenMP supports the fork-join model of parallel computing. At particular points in the execution the master thread spawns a number of threads and...
Read more >
Brief introduction about the types of worker in gunicorn and ...
In Python 2.7, Gunciorn provides serval types of worker: sync, gthread, eventlet, gevent and tornado. I classify them into three categories according to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found