(GKE) Dask Cluster Always Returning 0 Memory, 0 Workers, 0 Threads
See original GitHub issueBackground on the Problem
I’m currently following this guide: https://godatadriven.com/blog/develop-locally-scale-globally-dask-on-kubernetes-with-google-cloud/, but I have changed around a few things, namely using a LoadBalancer rather than a ClusterIP to not have to do any proxying / port forwarding.
- I am running a Completely Private cluster on GKE (Internet Access using a NAT, but External IPs only addressed to Load Balancers). The setup is exactly the same as in the article I posted above.
- I am using a locally hosted Jupyter Lab Notbook (run on Localhost NOT inside of my cluster) to test the creation of a Cluster and a Client.
- I haven’t created the Dask Cluster solely inside of my Kubernetes Cluster (for instance, in my Django Webserver inside of my Dask Cluster). I mention this because it could perhaps be a problem in that some connection to the Kubernetes Cluster isn’t working well, but that will be elaborated on later.
What happened:
Cluster and Client Details are found here:
<Client: 'tcp://10.1.2.2:8786' processes=0 threads=0, memory=0 B>
KubeCluster(dask-omarsumadi-ba1e72ae-1, 'tcp://104.196.146.82:8786', workers=0, threads=0, memory=0 B)
What you expected to happen:
I expected the output of the Cluster to have 1 process, 1 thread, and 1GB of memory.
Any Errors?:
No Errors anywhere - not on Kubernetes, not on Dask/Python
Minimal Reproduceable Example Ran in JupyterLab on WSL2
import dask
from dask_kubernetes import KubeCluster, make_pod_spec
from dask.distributed import Client
spec = {
"metadata": {},
"spec": {
"restartPolicy": "Never",
"serviceAccountName": "dask-service-account",
"containers": [
{
"image": "daskdev/dask:2021.3.0",
"imagePullPolicy": "Always",
"args": ["dask-worker","--no-bokeh","--death-timeout","60","--nthreads",'1',"--memory-limit",'1GB'],
"name": "dask-worer",
"resources": {
"requests": {
"cpu": "500m",
"memory": "1000Mi"
},
"limits": {
"cpu": "500m",
"memory": "1000Mi"
}
}
}
]
}
}
dask.config.set({'distributed.comm.timeouts.connect': '500s'})
dask.config.set({'kubernetes.scheduler-service-type': 'LoadBalancer'})
dask.config.get("kubernetes.scheduler-service-type")
try:
cluster = KubeCluster(spec, namespace='dask', deploy_mode="remote", scheduler_service_wait_timeout = 120)
client = Client(cluster)
print(client)
print(cluster)
except Exception as Error:
print("Error Hit")
print(Error, str(Exception))
Pod Creation Details and Logs Generated by Dask (-o yaml):
Pod Logs with Kubectl Output: https://pastebin.com/FwvAqgd5 Pod Spec: https://pastebin.com/FzLDsU4x
Scheduler Details and Logs Generated by Dask (-o yaml):
Scheduler Logs with Kubectl Output: https://pastebin.com/n4shGuDP Scheduler Spec: https://pastebin.com/iGGGKBJZ
RBAC and Namespace:
RBAC/Namespace: https://pastebin.com/FA2iE0HT
Environment:
- Dask version: 2021.3.0
- Dask Kubernetes Version: 2021.3.0
- Python version: 3.8.8
- Operating System: WSL2 (Ubuntu)
- Install method (conda, pip, source): JupyerHub on Conda with Dask_Kubernetes installed via pip
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (3 by maintainers)
Top Results From Across the Web
Why does my Dask client show zero workers, cores, and ...
I'm using Dask deployed using Helm on a Kubernetes cluster in Kubernetes Engine on GCP. My current cluster set up has 5 nodes...
Read more >k8s controller: DaskCluster's replicas lowered, worker pods ...
my_cluster.get_client() return a dask.distributed.Client instance. ... I created a new cluster and choose to adapt 0-5 workers in it.
Read more >Configuring a Distributed Dask Cluster a Beginner's Guide
The fundamental principle is that multiple threads are best to share data between tasks, but worse if running code that doesn't release Python's ......
Read more >Dask Kubernetes Documentation
When it does return a 0 it will go into a Completed state and the Dask cluster will be cleaned up automatically freeing...
Read more >Install on a Kubernetes Cluster - Dask Gateway
The worker pods communicate with the scheduler on port 8786. ... We recommend following the guide provided by zero-to-jupyterhub-k8s.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’ve been seeing those warnings recently too. I think there’s a shutdown issue somewhere, but it shouldn’t affect your work.
I’m going to close this out now. Welcome to the Dask community!
We already have a process for waiting for workers using the client.
I’m curious why you want to specifically wait for all workers though. With your cluster scaling in the background you can begin using it and submitting work. The scheduler will just queue things until workers appear and start processing stuff.