k8s controller: DaskCluster's replicas lowered, worker pods not deleted
See original GitHub issueIssue summary
I ended up with active but unused worker pods, unusable by the scheduler, blocking creation of new pods if a DaskCluster needs to scale up, but unable to scale down if a DaskCluster reduced its replica count.
My understanding
my_cluster.get_client()
return adask.distributed.Client
instance.- The client instance is connected to the associated cluster’s scheduler and gets its information about workers from the scheduler.
- The scheduler is quick to adapt and scale down, and DaskCluster often quickly follow.
- The DaskCluster resource will be able to adapt upwards, but if the worker had previously been scaled away, it may end up being unused and unregistered by the scheduler.
Original report extracted from #246
GatewayCluster widget inconsistent with actual pods
After my images had been pulled and pods could start quickly, I reset my state by deleting clusters etc. Then:
- I created a new cluster and choose to adapt 0-5 workers in it.
- I ran a job with a client to the cluster that finished in 20 seconds and I had 5 workers according to the client for a while, and then 0 again.
- I observed that I still had 5 pods, and my DaskCluster had 5 replicas.
- The controller is in my mind thereby doing its job, it ensures 5 replicas of the workers.
- There is something wrong though, because the scheduler knew to delete a worker, but that didn’t lead to a change in the DaskCluster resource, so the controller didn’t remove any pods.
- I decided to try run my job again, would it add five new pods? It turns out no, it would instead error with a Timeout like if my workers failed to get started fast enough.
- I tried adapting to 6/6, that added one worker, and the client could observe 1 worker while there was 6 replicas according to the DaskCluster resource and I saw 6 pods.
- I ran my workload, and ended up doing work using a single worker.
- I tried adapting to 0-3 workers, but kept seeing 6 pods, and this were the logs of the controller, while the DaskCluster resource were updated to 1 replica.
[D 2020-04-14 17:33:11.857 KubeController] Event - MODIFIED cluster dask-gateway.c047a173d45247dd81f232ab60d692ca [I 2020-04-14 17:33:11.857 KubeController] Reconciling cluster dask-gateway.c047a173d45247dd81f232ab60d692ca [I 2020-04-14 17:33:11.862 KubeController] Finished reconciling cluster dask-gateway.c047a173d45247dd81f232ab60d692ca
- I tried running my workload, and it seems that my client never registered more than one replica at best.
This may be two separate issues. Hmm… or not?
- The controller fail to align correctly with the DaskCluster resource and never successfully delete surplus pods what the DaskCluster resource indicate it doesn’t need.
- Workers that the scheduler has used once isn’t reused later if adapted away, except perhaps one single worker.
Cluster adapt 0-5 can trigger as if it were 1-5
If I have a fresh cluster and press adapt 0-5, it doesn’t create a worker for me, but I have ended up in a state where just going from 0-0 to 0-5 would add back a replica in the DaskCluster resource. I think the scheduled ended up thinking it kept needing one. This is a state I failed to reproduce quickly with a new cluster.
Issue Analytics
- State:
- Created 3 years ago
- Comments:18 (8 by maintainers)
Top Results From Across the Web
Dask Kubernetes Operator
Autoscaling is handled by a custom Kubernetes controller instead of the user code; Scheduler and worker Pods and Services are fully configurable.
Read more >Stale scheduler/workers on a deployment using Dask Gateway
This deployment uses Dask Gateway to deploy clusters and send ... k8s controller: DaskCluster's replicas lowered, worker pods not deleted.
Read more >ReplicationController - Kubernetes
To delete a ReplicationController and all its pods, use kubectl delete . Kubectl will scale the ReplicationController to zero and wait for it...
Read more >Dask Kubernetes Environment - Prefect Docs
The worker spec has replicas: 2 which means that on creation of the Dask cluster there will be two worker pods for executing...
Read more >Using Dask on KubeFlow with the Dask Kubernetes Operator
Dask's schedulers scale to thousand-node clusters but also work just fine ... or the Jupyter pod that KubeFlow provides we can use the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
According to https://stackoverflow.com/a/59658670, the
restartCount
is computed according to the number of dead containers that have yet to be cleaned up. If the kubelet is cleaning up containers, it will cause that number to be reset.No, sorry.
dask-gateway
creates and manages the scheduler and worker pods, but once a pod is created we only obseve it, we don’t ever update a created pod (e.g. for a restart). The k8s pod controller is responsible for handling that.dask.distributed
doesn’t know or do anything with k8s.