Failed to create KubeCluster after upgrading to 2022.4.0 / 2022.4.1
See original GitHub issueWhat happened:
I’ve been creating KubeCluster
with:
worker_spec = make_pod_spec(
extra_pod_config={
"imagePullSecrets": ...,
"restartPolicy": "Never",
},
extra_container_config={
"command": None,
"name": "dask-worker",
},
image=...,
memory_limit=...,
memory_request=...,
cpu_limit=...,
cpu_request=...,
annotations={
"cluster-autoscaler.kubernetes.io/safe-to-evict": "false",
},
)
scheduler_spec = make_pod_spec(
extra_pod_config={
"imagePullSecrets": ...,
},
extra_container_config={
"command": None,
"name": "dask-scheduler",
},
image=...,
memory_limit=...,
memory_request=...,
cpu_limit=...,
cpu_request=...,
annotations={
"cluster-autoscaler.kubernetes.io/safe-to-evict": "false",
},
)
KubeCluster(
pod_template=worker_spec,
scheduler_pod_template=scheduler_spec,
namespace=...,
idle_timeout=600,
)
And it has been working without any problems with 2021.10.0 and 2022.1.0. After upgrading to 2022.4.1 (or 2022.4.0) I’m not able to run KubeCluster
successfully. The following error is issued:
Traceback (most recent call last):
(...)
File "/usr/local/lib/python3.8/site-packages/dask_kubernetes/core.py", line 496, in __init__
super().__init__(**self.kwargs)
File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 275, in __init__
self.sync(self._start)
File "/usr/local/lib/python3.8/site-packages/distributed/deploy/cluster.py", line 220, in sync
return sync(self.loop, func, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 327, in sync
raise exc.with_traceback(tb)
File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 310, in f
result[0] = yield future
File "/usr/local/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
value = future.result()
File "/usr/local/lib/python3.8/site-packages/dask_kubernetes/core.py", line 626, in _start
await super()._start()
File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 304, in _start
self.scheduler = await self.scheduler
File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 59, in _
await self.start()
File "/usr/local/lib/python3.8/site-packages/dask_kubernetes/core.py", line 198, in start
logs = await self.logs()
File "/usr/local/lib/python3.8/site-packages/dask_kubernetes/core.py", line 118, in logs
raise e
File "/usr/local/lib/python3.8/site-packages/dask_kubernetes/core.py", line 109, in logs
log = await self.core_api.read_namespaced_pod_log(
File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/api_client.py", line 192, in __call_api
raise e
File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api
response_data = await self.request(
File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/rest.py", line 193, in GET
return (await self.request("GET", url,
File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/rest.py", line 187, in request
raise ApiException(http_resp=r)
kubernetes_asyncio.client.exceptions.ApiException: (400)
Reason: Bad Request
HTTP response headers: <CIMultiDictProxy('Audit-Id': '31a36776-5258-4a7f-99da-064bcef356ca', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Tue, 10 May 2022 10:47:00 GMT', 'Content-Length': '183')>
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container dask-worker is not valid for pod dask-root-2b3b413a-efzl92","reason":"BadRequest","code":400}
Do I need to change something with this new version?
Environment:
- Dask version: dask[complete]==2021.10.0 and dask[complete]==2022.04.2
- Python version: 3.8.10
- Operating System: ubuntu1604
- Install method (conda, pip, source): pip
Issue Analytics
- State:
- Created a year ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Resolve issues when upgrading AKS hybrid - Microsoft Learn
When a cluster has not been upgraded for more than 60 days, the node agent fails to start during a node agent restart...
Read more >"unable to create impersonator account ... serviceaccounts not ...
Experiencing the same issue after 2.6.0 upgrade. After downloading the kubeconfig, and running kubectl the following error is being observed: $ ...
Read more >Gradle Enterprise Helm Kubernetes Installation Manual
Helm can manage components of a cluster, tracking what has been installed and upgrading it gracefully. The helm command is used to install...
Read more >Upgrading kubeadm clusters - Kubernetes
This page explains how to upgrade a Kubernetes cluster created with kubeadm from version 1.25.x to version 1.26.x, and from version 1.26.x ...
Read more >KubeCluster (classic) - Dask Kubernetes
To launch a Dask cluster on Kubernetes with KubeCluster you need to first configure your worker pod specification. Then create a cluster with...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@jacobtomlinson A PR which should fix this is out, let me know what you think (#503)
@jacobtomlinson I think I should be able to help out 👍 I’ll try to open a PR tomorrow