Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

(GKE) Dask Cluster Always Returning 0 Memory, 0 Workers, 0 Threads

See original GitHub issue

Background on the Problem

I’m currently following this guide: https://godatadriven.com/blog/develop-locally-scale-globally-dask-on-kubernetes-with-google-cloud/, but I have changed around a few things, namely using a LoadBalancer rather than a ClusterIP to not have to do any proxying / port forwarding.

I am running a Completely Private cluster on GKE (Internet Access using a NAT, but External IPs only addressed to Load Balancers). The setup is exactly the same as in the article I posted above.
I am using a locally hosted Jupyter Lab Notbook (run on Localhost NOT inside of my cluster) to test the creation of a Cluster and a Client.
I haven’t created the Dask Cluster solely inside of my Kubernetes Cluster (for instance, in my Django Webserver inside of my Dask Cluster). I mention this because it could perhaps be a problem in that some connection to the Kubernetes Cluster isn’t working well, but that will be elaborated on later.

What happened:

Cluster and Client Details are found here:

<Client: 'tcp://10.1.2.2:8786' processes=0 threads=0, memory=0 B>
KubeCluster(dask-omarsumadi-ba1e72ae-1, 'tcp://104.196.146.82:8786', workers=0, threads=0, memory=0 B)

What you expected to happen:

I expected the output of the Cluster to have 1 process, 1 thread, and 1GB of memory.

Any Errors?:

No Errors anywhere - not on Kubernetes, not on Dask/Python

Minimal Reproduceable Example Ran in JupyterLab on WSL2

import dask
from dask_kubernetes import KubeCluster, make_pod_spec
from dask.distributed import Client

spec = {
    "metadata": {},
    "spec": {
        "restartPolicy": "Never",
        "serviceAccountName": "dask-service-account",
        "containers": [
            {
                "image": "daskdev/dask:2021.3.0",
                "imagePullPolicy": "Always",
                "args": ["dask-worker","--no-bokeh","--death-timeout","60","--nthreads",'1',"--memory-limit",'1GB'],
                "name": "dask-worer",
                "resources": {
                    "requests": {
                        "cpu": "500m",
                        "memory": "1000Mi"
                    },
                    "limits": {
                        "cpu": "500m",
                        "memory": "1000Mi"
                    }
                }
            }
        ]
    }
}
dask.config.set({'distributed.comm.timeouts.connect': '500s'})
dask.config.set({'kubernetes.scheduler-service-type': 'LoadBalancer'})
dask.config.get("kubernetes.scheduler-service-type")

try:
    cluster = KubeCluster(spec, namespace='dask', deploy_mode="remote", scheduler_service_wait_timeout = 120)
    client = Client(cluster)
    print(client)
    print(cluster)
except Exception as Error:
    print("Error Hit")
    print(Error, str(Exception))

Environment:

Dask version: 2021.3.0
Dask Kubernetes Version: 2021.3.0
Python version: 3.8.8
Operating System: WSL2 (Ubuntu)
Install method (conda, pip, source): JupyerHub on Conda with Dask_Kubernetes installed via pip

Issue Analytics

State:
Created 3 years ago
Comments:9 (3 by maintainers)

Top GitHub Comments

1reaction

jacobtomlinsoncommented, Mar 19, 2021

I’ve been seeing those warnings recently too. I think there’s a shutdown issue somewhere, but it shouldn’t affect your work.

I’m going to close this out now. Welcome to the Dask community!

1reaction

jacobtomlinsoncommented, Mar 19, 2021

We already have a process for waiting for workers using the client.

cluster = KubeCluster(spec, namespace='dask', deploy_mode="remote", scheduler_service_wait_timeout = 120)
cluster.scale(2)
client = Client(cluster)
client.wait_for_workers(2)

I’m curious why you want to specifically wait for all workers though. With your cluster scaling in the background you can begin using it and submitting work. The scheduler will just queue things until workers appear and start processing stuff.

Top Results From Across the Web

Why does my Dask client show zero workers, cores, and ...

I'm using Dask deployed using Helm on a Kubernetes cluster in Kubernetes Engine on GCP. My current cluster set up has 5 nodes...

k8s controller: DaskCluster's replicas lowered, worker pods ...

my_cluster.get_client() return a dask.distributed.Client instance. ... I created a new cluster and choose to adapt 0-5 workers in it.

Configuring a Distributed Dask Cluster a Beginner's Guide

The fundamental principle is that multiple threads are best to share data between tasks, but worse if running code that doesn't release Python's ......

Dask Kubernetes Documentation

When it does return a 0 it will go into a Completed state and the Dask cluster will be cleaned up automatically freeing...

Install on a Kubernetes Cluster - Dask Gateway

The worker pods communicate with the scheduler on port 8786. ... We recommend following the guide provided by zero-to-jupyterhub-k8s.