question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failed to create KubeCluster after upgrading to 2022.4.0 / 2022.4.1

See original GitHub issue

What happened:

I’ve been creating KubeCluster with:

worker_spec = make_pod_spec(
            extra_pod_config={
                "imagePullSecrets": ...,
                "restartPolicy": "Never",
            },
            extra_container_config={
                "command": None,
                "name": "dask-worker",
            },
            image=...,
            memory_limit=...,
            memory_request=...,
            cpu_limit=...,
            cpu_request=...,
            annotations={
                "cluster-autoscaler.kubernetes.io/safe-to-evict": "false",
            },
)

scheduler_spec = make_pod_spec(
            extra_pod_config={
                "imagePullSecrets": ...,
            },
            extra_container_config={
                "command": None,
                "name": "dask-scheduler",
            },
            image=...,
            memory_limit=...,
            memory_request=...,
            cpu_limit=...,
            cpu_request=...,
            annotations={
                "cluster-autoscaler.kubernetes.io/safe-to-evict": "false",
            },
)

KubeCluster(
    pod_template=worker_spec,
    scheduler_pod_template=scheduler_spec,
    namespace=...,
    idle_timeout=600,
)

And it has been working without any problems with 2021.10.0 and 2022.1.0. After upgrading to 2022.4.1 (or 2022.4.0) I’m not able to run KubeCluster successfully. The following error is issued:

Traceback (most recent call last):

(...)

  File "/usr/local/lib/python3.8/site-packages/dask_kubernetes/core.py", line 496, in __init__
    super().__init__(**self.kwargs)
  File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 275, in __init__
    self.sync(self._start)
  File "/usr/local/lib/python3.8/site-packages/distributed/deploy/cluster.py", line 220, in sync
    return sync(self.loop, func, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 327, in sync
    raise exc.with_traceback(tb)
  File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 310, in f
    result[0] = yield future
  File "/usr/local/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/usr/local/lib/python3.8/site-packages/dask_kubernetes/core.py", line 626, in _start
    await super()._start()
  File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 304, in _start
    self.scheduler = await self.scheduler
  File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 59, in _
    await self.start()
  File "/usr/local/lib/python3.8/site-packages/dask_kubernetes/core.py", line 198, in start
    logs = await self.logs()
  File "/usr/local/lib/python3.8/site-packages/dask_kubernetes/core.py", line 118, in logs
    raise e
  File "/usr/local/lib/python3.8/site-packages/dask_kubernetes/core.py", line 109, in logs
    log = await self.core_api.read_namespaced_pod_log(
  File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/api_client.py", line 192, in __call_api
    raise e
  File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api
    response_data = await self.request(
  File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/rest.py", line 193, in GET
    return (await self.request("GET", url,
  File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/rest.py", line 187, in request
    raise ApiException(http_resp=r)
kubernetes_asyncio.client.exceptions.ApiException: (400)
Reason: Bad Request
HTTP response headers: <CIMultiDictProxy('Audit-Id': '31a36776-5258-4a7f-99da-064bcef356ca', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Tue, 10 May 2022 10:47:00 GMT', 'Content-Length': '183')>
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container dask-worker is not valid for pod dask-root-2b3b413a-efzl92","reason":"BadRequest","code":400}

Do I need to change something with this new version?

Environment:

  • Dask version: dask[complete]==2021.10.0 and dask[complete]==2022.04.2
  • Python version: 3.8.10
  • Operating System: ubuntu1604
  • Install method (conda, pip, source): pip

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
bstadlbauercommented, May 20, 2022

@jacobtomlinson A PR which should fix this is out, let me know what you think (#503)

2reactions
bstadlbauercommented, May 18, 2022

@jacobtomlinson I think I should be able to help out 👍 I’ll try to open a PR tomorrow

Read more comments on GitHub >

github_iconTop Results From Across the Web

Resolve issues when upgrading AKS hybrid - Microsoft Learn
When a cluster has not been upgraded for more than 60 days, the node agent fails to start during a node agent restart...
Read more >
"unable to create impersonator account ... serviceaccounts not ...
Experiencing the same issue after 2.6.0 upgrade. After downloading the kubeconfig, and running kubectl the following error is being observed: $ ...
Read more >
Gradle Enterprise Helm Kubernetes Installation Manual
Helm can manage components of a cluster, tracking what has been installed and upgrading it gracefully. The helm command is used to install...
Read more >
Upgrading kubeadm clusters - Kubernetes
This page explains how to upgrade a Kubernetes cluster created with kubeadm from version 1.25.x to version 1.26.x, and from version 1.26.x ...
Read more >
KubeCluster (classic) - Dask Kubernetes
To launch a Dask cluster on Kubernetes with KubeCluster you need to first configure your worker pod specification. Then create a cluster with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found