Cannot connect to kube-api service
See original GitHub issueDescribe the issue: Getting timeout error to the kube-api service when dask is trying to remove workers
Error logs:
2022-11-21 00:34:34 | 2022-11-20 22:34:34,074 - distributed.deploy.adaptive - INFO - Retiring workers [11, 34, 35, 36, 37, 38, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67]
2022-11-21 00:35:54 | aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 10.0.0.1:443 ssl:default [None]
| | 2022-11-21 00:35:54 | raise client_error(req.connection_key, exc) from exc
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 988, in _wrap_create_connection
| | 2022-11-21 00:35:54 | transp, proto = await self._wrap_create_connection(
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 1175, in _create_direct_connection
| | 2022-11-21 00:35:54 | raise last_exc
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 1206, in _create_direct_connection
| | 2022-11-21 00:35:54 | _, proto = await self._create_direct_connection(req, traces, timeout)
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 901, in _create_connection
| | 2022-11-21 00:35:54 | proto = await self._create_connection(req, traces, timeout)
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 540, in connect
| | 2022-11-21 00:35:54 | conn = await self._connector.connect(
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/aiohttp/client.py", line 536, in _request
| | 2022-11-21 00:35:54 | r = await self.pool_manager.request(**args)
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/rest.py", line 177, in request
| | 2022-11-21 00:35:54 | return (await self.request("DELETE", url,
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/rest.py", line 220, in DELETE
| | 2022-11-21 00:35:54 | response_data = await self.request(
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api
| | 2022-11-21 00:35:54 | await self.core_api.delete_namespaced_pod(name, namespace)
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/dask_kubernetes/classic/kubecluster.py", line 96, in close
| | 2022-11-21 00:35:54 | await asyncio.gather(*tasks)
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 343, in _correct_state_internal
| | 2022-11-21 00:35:54 | await self._correct_state()
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 400, in _
| | 2022-11-21 00:35:54 | await self
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 559, in scale_down
| | 2022-11-21 00:35:54 | await f
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/adaptive.py", line 204, in scale_down
| | 2022-11-21 00:35:54 | return await func(*args, **kwargs)
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 742, in wrapper
| | 2022-11-21 00:35:54 | await self.scale_down(**recommendations)
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/adaptive_core.py", line 240, in adapt
| | 2022-11-21 00:35:54 | Traceback (most recent call last):
| | 2022-11-21 00:35:54 |
| | 2022-11-21 00:35:54 | The above exception was the direct cause of the following exception:
| | 2022-11-21 00:35:54 |
| | 2022-11-21 00:35:54 | ConnectionResetError
| | 2022-11-21 00:35:54 | await waiter
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1080, in _create_connection_transport
| | 2022-11-21 00:35:54 | transport, protocol = await self._create_connection_transport(
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1050, in create_connection
| | 2022-11-21 00:35:54 | return await self._loop.create_connection(*args, **kwargs) # type: ignore[return-value] # noqa
| | 2022-11-21 00:35:54 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 980, in _wrap_create_connection
| | 2022-11-21 00:35:54 | Traceback (most recent call last):
| | 2022-11-21 00:35:54 | 2022-11-20 22:35:53,142 - distributed.deploy.adaptive_core - ERROR - Error during adaptive downscaling. Ignoring.
| | 2022-11-21 00:35:52 | aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 10.0.0.1:443 ssl:default [None]
| | 2022-11-21 00:35:52 | raise client_error(req.connection_key, exc) from exc
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 988, in _wrap_create_connection
| | 2022-11-21 00:35:52 | transp, proto = await self._wrap_create_connection(
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 1175, in _create_direct_connection
| | 2022-11-21 00:35:52 | raise last_exc
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 1206, in _create_direct_connection
| | 2022-11-21 00:35:52 | _, proto = await self._create_direct_connection(req, traces, timeout)
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 901, in _create_connection
| | 2022-11-21 00:35:52 | proto = await self._create_connection(req, traces, timeout)
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 540, in connect
| | 2022-11-21 00:35:52 | conn = await self._connector.connect(
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/aiohttp/client.py", line 536, in _request
| | 2022-11-21 00:35:52 | r = await self.pool_manager.request(**args)
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/rest.py", line 177, in request
| | 2022-11-21 00:35:52 | return (await self.request("DELETE", url,
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/rest.py", line 220, in DELETE
| | 2022-11-21 00:35:52 | response_data = await self.request(
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/api_client.py", line 185, in __call_api
| | 2022-11-21 00:35:52 | await self.core_api.delete_namespaced_pod(name, namespace)
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/dask_kubernetes/classic/kubecluster.py", line 96, in close
| | 2022-11-21 00:35:52 | await asyncio.gather(*tasks)
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 343, in _correct_state_internal
| | 2022-11-21 00:35:52 | await self._correct_state()
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 400, in _
| | 2022-11-21 00:35:52 | await self
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/spec.py", line 559, in scale_down
| | 2022-11-21 00:35:52 | await f
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/distributed/deploy/adaptive.py", line 204, in scale_down
| | 2022-11-21 00:35:52 | return await func(*args, **kwargs)
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 742, in wrapper
| | 2022-11-21 00:35:52 | Traceback (most recent call last):
| | 2022-11-21 00:35:52 |
| | 2022-11-21 00:35:52 | The above exception was the direct cause of the following exception:
| | 2022-11-21 00:35:52 |
| | 2022-11-21 00:35:52 | ConnectionResetError
| | 2022-11-21 00:35:52 | await waiter
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1080, in _create_connection_transport
| | 2022-11-21 00:35:52 | transport, protocol = await self._create_connection_transport(
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1050, in create_connection
| | 2022-11-21 00:35:52 | return await self._loop.create_connection(*args, **kwargs) # type: ignore[return-value] # noqa
| | 2022-11-21 00:35:52 | File "/usr/local/lib/python3.8/site-packages/aiohttp/connector.py", line 980, in _wrap_create_connection
| | 2022-11-21 00:35:52 | Traceback (most recent call last):
| | 2022-11-21 00:35:52 | Cannot connect to host 10.0.0.1:443 ssl:default [None]
Environment:
- AKS version: 1.23.8
- Dask version: 2022.9.1
- Python version: 3.8.15
Issue Analytics
- State:
- Created 10 months ago
- Comments:10 (5 by maintainers)
Top Results From Across the Web
Pods not able to connect to kube-apiserver | Support - SUSE
In CaaS Platform, pods cannot communicate to kube-apiserver. ... that kured pod is not able to access the kubernetes default service.
Read more >Cannot access kube-api (with Service-IP) from a pod in default ...
I cannot access the kube-api from inside a pod on a minion node (default namespace). Accessing other service ips of other deployments (same ......
Read more >Troubleshoot kubectl connection refused | by David O'Dell
Learn some ways to troubleshoot and debug your kubectl connection refused conundrum! ... but you're here now because you can't connect to your...
Read more >Troubleshoot cluster connection issues with the API server
This article discusses connection issues to an Azure Kubernetes Service (AKS) cluster when you can't reach the cluster's API server through ...
Read more >Troubleshoot the API server endpoint of an Amazon EKS Cluster
You can't run kubectl commands on the cluster after you change the endpoint access from public to private. 1. Confirm that you're using...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
thanks @jacobtomlinson , we will try this. we have indeed kicked off work to move to the new implementation as well
I’ve opened #617 with a potential mitigation to this. Could you try out that branch and see if it works for you? Also as I said I’d recommend migrating to the new implementation, it shouldn’t be too much work, shout if you run into any trouble with that.