question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

(GKE) Dask Cluster Always Returning 0 Memory, 0 Workers, 0 Threads

See original GitHub issue

Background on the Problem

I’m currently following this guide: https://godatadriven.com/blog/develop-locally-scale-globally-dask-on-kubernetes-with-google-cloud/, but I have changed around a few things, namely using a LoadBalancer rather than a ClusterIP to not have to do any proxying / port forwarding.

  • I am running a Completely Private cluster on GKE (Internet Access using a NAT, but External IPs only addressed to Load Balancers). The setup is exactly the same as in the article I posted above.
  • I am using a locally hosted Jupyter Lab Notbook (run on Localhost NOT inside of my cluster) to test the creation of a Cluster and a Client.
  • I haven’t created the Dask Cluster solely inside of my Kubernetes Cluster (for instance, in my Django Webserver inside of my Dask Cluster). I mention this because it could perhaps be a problem in that some connection to the Kubernetes Cluster isn’t working well, but that will be elaborated on later.

What happened:

Cluster and Client Details are found here:

<Client: 'tcp://10.1.2.2:8786' processes=0 threads=0, memory=0 B>
KubeCluster(dask-omarsumadi-ba1e72ae-1, 'tcp://104.196.146.82:8786', workers=0, threads=0, memory=0 B)

What you expected to happen:

I expected the output of the Cluster to have 1 process, 1 thread, and 1GB of memory.

Any Errors?:

No Errors anywhere - not on Kubernetes, not on Dask/Python

Minimal Reproduceable Example Ran in JupyterLab on WSL2

import dask
from dask_kubernetes import KubeCluster, make_pod_spec
from dask.distributed import Client

spec = {
    "metadata": {},
    "spec": {
        "restartPolicy": "Never",
        "serviceAccountName": "dask-service-account",
        "containers": [
            {
                "image": "daskdev/dask:2021.3.0",
                "imagePullPolicy": "Always",
                "args": ["dask-worker","--no-bokeh","--death-timeout","60","--nthreads",'1',"--memory-limit",'1GB'],
                "name": "dask-worer",
                "resources": {
                    "requests": {
                        "cpu": "500m",
                        "memory": "1000Mi"
                    },
                    "limits": {
                        "cpu": "500m",
                        "memory": "1000Mi"
                    }
                }
            }
        ]
    }
}
dask.config.set({'distributed.comm.timeouts.connect': '500s'})
dask.config.set({'kubernetes.scheduler-service-type': 'LoadBalancer'})
dask.config.get("kubernetes.scheduler-service-type")

try:
    cluster = KubeCluster(spec, namespace='dask', deploy_mode="remote", scheduler_service_wait_timeout = 120)
    client = Client(cluster)
    print(client)
    print(cluster)
except Exception as Error:
    print("Error Hit")
    print(Error, str(Exception))

Pod Creation Details and Logs Generated by Dask (-o yaml):

Pod Logs with Kubectl Output: https://pastebin.com/FwvAqgd5 Pod Spec: https://pastebin.com/FzLDsU4x

Scheduler Details and Logs Generated by Dask (-o yaml):

Scheduler Logs with Kubectl Output: https://pastebin.com/n4shGuDP Scheduler Spec: https://pastebin.com/iGGGKBJZ

RBAC and Namespace:

RBAC/Namespace: https://pastebin.com/FA2iE0HT

Environment:

  • Dask version: 2021.3.0
  • Dask Kubernetes Version: 2021.3.0
  • Python version: 3.8.8
  • Operating System: WSL2 (Ubuntu)
  • Install method (conda, pip, source): JupyerHub on Conda with Dask_Kubernetes installed via pip

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
jacobtomlinsoncommented, Mar 19, 2021

I’ve been seeing those warnings recently too. I think there’s a shutdown issue somewhere, but it shouldn’t affect your work.

I’m going to close this out now. Welcome to the Dask community!

1reaction
jacobtomlinsoncommented, Mar 19, 2021

We already have a process for waiting for workers using the client.

cluster = KubeCluster(spec, namespace='dask', deploy_mode="remote", scheduler_service_wait_timeout = 120)
cluster.scale(2)
client = Client(cluster)
client.wait_for_workers(2)

I’m curious why you want to specifically wait for all workers though. With your cluster scaling in the background you can begin using it and submitting work. The scheduler will just queue things until workers appear and start processing stuff.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why does my Dask client show zero workers, cores, and ...
I'm using Dask deployed using Helm on a Kubernetes cluster in Kubernetes Engine on GCP. My current cluster set up has 5 nodes...
Read more >
k8s controller: DaskCluster's replicas lowered, worker pods ...
my_cluster.get_client() return a dask.distributed.Client instance. ... I created a new cluster and choose to adapt 0-5 workers in it.
Read more >
Configuring a Distributed Dask Cluster a Beginner's Guide
The fundamental principle is that multiple threads are best to share data between tasks, but worse if running code that doesn't release Python's ......
Read more >
Dask Kubernetes Documentation
When it does return a 0 it will go into a Completed state and the Dask cluster will be cleaned up automatically freeing...
Read more >
Install on a Kubernetes Cluster - Dask Gateway
The worker pods communicate with the scheduler on port 8786. ... We recommend following the guide provided by zero-to-jupyterhub-k8s.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found