KubeCluster loses the ability to run commands against a deployed cluster after periods of inactivity.
See original GitHub issueCluster is created as:
cluster = KubeCluster(pod_template=worker_pod, scheduler_pod=sched_pod, deploy_mode='remote')
What happened: Initially, all API commands run against the cluster, such as ‘scale’, ‘close’, etc… function as expected. After some period of time, the cluster object fails to authenticate with the cluster and all API commands return a failed status due to being ‘unauthorized’.
site-packages/dask_kubernetes/core.py", line 81, in close
await self.core_api.delete_namespaced_pod(name, namespace)
...
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
What you expected to happen: I would expect the cluster object to re-authenticate, or provide some mechanism to re-attach to the created cluster.
# Put your MCVE code here
Anything else we need to know?:
Environment: Environment is a GKE cluster. Prior to losing the ability to authenticate, everything else functions as expected.
- Dask version: 2.30.0
- Python version: 3.8.6
- Operating System: GKE + CoS
- Install method (conda, pip, source):
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
KubeCluster (classic) - Dask Kubernetes
KubeCluster deploys Dask clusters on Kubernetes clusters using native Kubernetes APIs. It is designed to dynamically launch ad-hoc deployments.
Read more >Kube Cluster Nodes Test - eG Innovations
Load Balancer / Master Node IP. To run this test and report metrics, the eG agent needs to connect to the Kubernetes API...
Read more >Debugging Your Kubernetes Nodes in the 'Not Ready' State
In this article, you'll learn a few possible reasons why a node might enter the NotReady state and how you can debug it....
Read more >Install kubectl and configure cluster access - Google Cloud
Run kubectl commands against a specific cluster using the --cluster flag. View kubeconfig. To view your environment's kubeconfig , run the following command:....
Read more >Teleport CLI Reference | Teleport Docs
Detailed guide and reference documentation for Teleport's command line ... joining the cluster # serviced by the auth server running on 10.1.0.1 sudo ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@jacobtomlinson Any updates on this? If not, do you have any intuition on what’s preventing the re-auth from occurring? If I get some extra time I could take a look.
Sorry I’ve been out for a while and this is currently on the backlog.
If you have some time to investigate it would be much appreciated.