Best practices for remotely connecting to dask worker cluster
See original GitHub issueOur objective: Have a spot worker cluster with default 0 nodes running on AWS EKS. We hope to be able to connect to and use this cluster from a variety of places including:
- containers that are part of aws batch jobs
- kubernetes pods from other kubernetes clusters
- laptops
- etc
After looking through the documentation, I imagine that from the perspective of the environment that wishes to use the dask cluster, the workflow would be something like:
export DASK_KUBERNETES__HOST="https://__aws_identifier__.us-west-2.eks.amazonaws.com"
export DASK_KUBERNETES__NAME="dask-worker-cluster"
(my understanding is environmental variables overwrite the values normally set here)
and then within the script/notebook:
cluster = KubeCluster.from_yaml('worker-values.yaml')
- Spawn a local dask scheduler
cluster.scale_up(50)
- Call the EKS (in this example) dask cluster control plane and ask for 50 spot instances to be spun up
When I run the above commands from a jupyter notebook on my laptop, no errors are thrown, but no additional ec2 machines are spun up.
I should note that using eksctl
I have successfully scaled up and down nodes on this cluster. Also, I confirmed that kubectl cluster-info
on my laptop returns the correct API endpoint listed on this EKS cluster in the AWS console.
The good folks at Pangeo explained to me that in their default deployments, a dask scheduler is attached to the same (on-demand) pod that runs jupyter. In such a configuration, multiple dask schedulers can request resources from spot/pre-emptible dask workers.
My thinking is that what I described above roughly follows the same design pattern as Pangeo, however, if I’m correctly interpreting the discussion here and more specifically, @mrocklin’s comment here, maybe a dedicated scheduler(s) would need to be running within the k8s cluster?
However, based off of this statement:
Since the Dask scheduler is launched locally, for it to work, we need to
be able to open network connections between this local node and all the
workers nodes on the Kubernetes cluster. If the current process is not
already on a Kubernetes node, some network configuration will likely be
required to make this work.
I’m hoping that my usecase of local scheduler + remote k8s workers (+said network configs) should be possible.
My questions:
- What network configuration is required in order for a local dask scheduler to connect to a remote worker cluster?
- If this is possible, is there a page I missed that might include some best practices about how to configure such a setup? If not, I’d be happy to submit something to the docs.
- In terms of how Dask works, is the design pattern I’ve described the best way to support my usecases?
Thanks in advance for suggestions or advice you might be able to offer. 😃
Additional info:
dask = 1.1.1
dask-kubernetes=0.9
kubectl client version = v1.15.0
os = Pop!_OS (ubuntu) 18.04 LTS
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (5 by maintainers)
Top GitHub Comments
Correct, although there are likely other ways to solve this. Cloud providers like AWS provide ways to restrict in/out traffic to local certain IPs, so I think there should be a way to expose the workers without opening them to the whole internet. With #84 (scheduler on the cluster) this would be easier as you’d only need to expose the scheduler. I’m not familiar enough with AWS or kubernetes to know exactly what network configuration would be needed for either option (@jacobtomlinson may know).
In the long term
dask-gateway
intends to address this use case, where an admin sets up a cluster that users can connect to and create clusters. In this case you’d have a small node running the gateway instance, and spot instances for running user clusters as they are requested. If you’re interested, we hope to have a helm chart for trying this out in the near-ish future. Docs live here: https://jcrist.github.io/dask-gateway/Now that we have a workable implementation fo #84 and dask-gateway is getting more mature I’m going to close this with a couple of different routes you could explore.