question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot connect to kube-api service

See original GitHub issue

Describe the issue: Getting timeout error to the kube-api service when dask is trying to remove workers

Error logs:

2022-11-21 00:34:34 | 2022-11-20 22:34:34,074 - distributed.deploy.adaptive - INFO - Retiring workers [11, 34, 35, 36, 37, 38, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67]

2022-11-21 00:35:54 | aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 10.0.0.1:443 ssl:default [None]
  |   | 2022-11-21 00:35:54 | raise client_error(req.connection_key, exc) from exc
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 988, in _wrap_create_connection
  |   | 2022-11-21 00:35:54 | transp, proto = await self._wrap_create_connection(
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 1175, in _create_direct_connection
  |   | 2022-11-21 00:35:54 | raise last_exc
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 1206, in _create_direct_connection
  |   | 2022-11-21 00:35:54 | _, proto = await self._create_direct_connection(req, traces, timeout)
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 901, in _create_connection
  |   | 2022-11-21 00:35:54 | proto = await self._create_connection(req, traces, timeout)
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 540, in connect
  |   | 2022-11-21 00:35:54 | conn = await self._connector.connect(
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/aiohttp/client.py", line 536, in _request
  |   | 2022-11-21 00:35:54 | r = await self.pool_manager.request(**args)
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/rest.py", line 177, in request
  |   | 2022-11-21 00:35:54 | return (await self.request("DELETE", url,
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/rest.py", line 220, in DELETE
  |   | 2022-11-21 00:35:54 | response_data = await self.request(
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api
  |   | 2022-11-21 00:35:54 | await self.core_api.delete_namespaced_pod(name, namespace)
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/dask_kubernetes/classic/kubecluster.py", line 96, in close
  |   | 2022-11-21 00:35:54 | await asyncio.gather(*tasks)
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 343, in _correct_state_internal
  |   | 2022-11-21 00:35:54 | await self._correct_state()
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 400, in _
  |   | 2022-11-21 00:35:54 | await self
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 559, in scale_down
  |   | 2022-11-21 00:35:54 | await f
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/adaptive.py", line 204, in scale_down
  |   | 2022-11-21 00:35:54 | return await func(*args, **kwargs)
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 742, in wrapper
  |   | 2022-11-21 00:35:54 | await self.scale_down(**recommendations)
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/adaptive_core.py", line 240, in adapt
  |   | 2022-11-21 00:35:54 | Traceback (most recent call last):
  |   | 2022-11-21 00:35:54 |  
  |   | 2022-11-21 00:35:54 | The above exception was the direct cause of the following exception:
  |   | 2022-11-21 00:35:54 |  
  |   | 2022-11-21 00:35:54 | ConnectionResetError
  |   | 2022-11-21 00:35:54 | await waiter
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1080, in _create_connection_transport
  |   | 2022-11-21 00:35:54 | transport, protocol = await self._create_connection_transport(
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1050, in create_connection
  |   | 2022-11-21 00:35:54 | return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
  |   | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 980, in _wrap_create_connection
  |   | 2022-11-21 00:35:54 | Traceback (most recent call last):
  |   | 2022-11-21 00:35:54 | 2022-11-20 22:35:53,142 - distributed.deploy.adaptive_core - ERROR - Error during adaptive downscaling. Ignoring.
  |   | 2022-11-21 00:35:52 | aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 10.0.0.1:443 ssl:default [None]
  |   | 2022-11-21 00:35:52 | raise client_error(req.connection_key, exc) from exc
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 988, in _wrap_create_connection
  |   | 2022-11-21 00:35:52 | transp, proto = await self._wrap_create_connection(
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 1175, in _create_direct_connection
  |   | 2022-11-21 00:35:52 | raise last_exc
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 1206, in _create_direct_connection
  |   | 2022-11-21 00:35:52 | _, proto = await self._create_direct_connection(req, traces, timeout)
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 901, in _create_connection
  |   | 2022-11-21 00:35:52 | proto = await self._create_connection(req, traces, timeout)
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 540, in connect
  |   | 2022-11-21 00:35:52 | conn = await self._connector.connect(
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/aiohttp/client.py", line 536, in _request
  |   | 2022-11-21 00:35:52 | r = await self.pool_manager.request(**args)
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/rest.py", line 177, in request
  |   | 2022-11-21 00:35:52 | return (await self.request("DELETE", url,
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/rest.py", line 220, in DELETE
  |   | 2022-11-21 00:35:52 | response_data = await self.request(
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api
  |   | 2022-11-21 00:35:52 | await self.core_api.delete_namespaced_pod(name, namespace)
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/dask_kubernetes/classic/kubecluster.py", line 96, in close
  |   | 2022-11-21 00:35:52 | await asyncio.gather(*tasks)
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 343, in _correct_state_internal
  |   | 2022-11-21 00:35:52 | await self._correct_state()
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 400, in _
  |   | 2022-11-21 00:35:52 | await self
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 559, in scale_down
  |   | 2022-11-21 00:35:52 | await f
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/adaptive.py", line 204, in scale_down
  |   | 2022-11-21 00:35:52 | return await func(*args, **kwargs)
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 742, in wrapper
  |   | 2022-11-21 00:35:52 | Traceback (most recent call last):
  |   | 2022-11-21 00:35:52 |  
  |   | 2022-11-21 00:35:52 | The above exception was the direct cause of the following exception:
  |   | 2022-11-21 00:35:52 |  
  |   | 2022-11-21 00:35:52 | ConnectionResetError
  |   | 2022-11-21 00:35:52 | await waiter
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1080, in _create_connection_transport
  |   | 2022-11-21 00:35:52 | transport, protocol = await self._create_connection_transport(
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1050, in create_connection
  |   | 2022-11-21 00:35:52 | return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
  |   | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 980, in _wrap_create_connection
  |   | 2022-11-21 00:35:52 | Traceback (most recent call last):
  |   | 2022-11-21 00:35:52 | Cannot connect to host 10.0.0.1:443 ssl:default [None]

Environment:

  • AKS version: 1.23.8
  • Dask version: 2022.9.1
  • Python version: 3.8.15

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
geertjan-garviscommented, Nov 22, 2022

thanks @jacobtomlinson , we will try this. we have indeed kicked off work to move to the new implementation as well

1reaction
jacobtomlinsoncommented, Nov 22, 2022

I’ve opened #617 with a potential mitigation to this. Could you try out that branch and see if it works for you? Also as I said I’d recommend migrating to the new implementation, it shouldn’t be too much work, shout if you run into any trouble with that.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pods not able to connect to kube-apiserver | Support - SUSE
In CaaS Platform, pods cannot communicate to kube-apiserver. ... that kured pod is not able to access the kubernetes default service.
Read more >
Cannot access kube-api (with Service-IP) from a pod in default ...
I cannot access the kube-api from inside a pod on a minion node (default namespace). Accessing other service ips of other deployments (same ......
Read more >
Troubleshoot kubectl connection refused | by David O'Dell
Learn some ways to troubleshoot and debug your kubectl connection refused conundrum! ... but you're here now because you can't connect to your...
Read more >
Troubleshoot cluster connection issues with the API server
This article discusses connection issues to an Azure Kubernetes Service (AKS) cluster when you can't reach the cluster's API server through ...
Read more >
Troubleshoot the API server endpoint of an Amazon EKS Cluster
You can't run kubectl commands on the cluster after you change the endpoint access from public to private. 1. Confirm that you're using...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found