Support Multiple Worker Types
See original GitHub issueAs a machine learning engineer I would like to specify the resources for specific operations just like this example in the dask tutorial.
I am pretty new to dask, kubernetes, and helm, but have had success deploying with helm on GKE. And, I think I have seen enough examples of using nodeSelector for gpu node pools, gcs’s daemonset for driver installs, etc. that I could get them up and running easily. However, the machines that have accelerators are often limited or we need to scale cpu and gpu independently.
Specifically, I wish to basically run the example in the link:
with dask.annotate(resources={'GPU': 1}):
processed = [client.submit(process, d) for d in data]
with dask.annotate(resources={'MEMORY': 70e9}):
final = client.submit(aggregate, processed)
From what I gather, this is not possible as is with a kubernetes-based deployment using HelmCluster
or KubeCluster
. I am also noticing that .scale
is not designed for multiple worker types, which makes me think this might be a heavier lift than I think and looks like this issue might actually belong in dask-distributed.
I am seeing the blocker from 2018 here: https://github.com/dask/distributed/issues/2118
I have been able to manually (with kubectl
) add a pod with the worker-spec.yaml
example here and the scheduler shows it is available where cluster.scale() has no effect, but I honestly have no clue how to build a deployment to control that pod spec even if this hack worked.
note: it looks like ray is doing this here
Anyways, a few questions:
- Is it possible to currently support multiple worker types? (in case I missed something)
- Is there a deployment hack to take a
worker-spec.yaml
and deploy an auxiliary set of these workers? i.e. an intermediate step (without support by dask-distributed) to easily supplement an existing worker set with highmem or gpus, etc.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:5 (2 by maintainers)
Top GitHub Comments
Yeah I’m going to close this out now. We have a blog post in the pipeline to demo this functionality dask/dask-blog#130
Thanks for raising this @ljstrnadiii. Excited to see users asking for this.
Scaling multiple worker groups is not well supported by any of the helper tools in Dask today, including
dask-kubernetes
. The annotations support is still relatively new.We intend to address this as part of #256.
The only workaround I can suggest today is that you install the Dask Helm Chart and then manually create a second deployment of workers with GPUs and handle the scaling yourself.