Deployment on Kubernetes backed by AWS EKS on Fargate fails
See original GitHub issueWhat happened:
Can’t reach the API when deploying on AWS EKS with a Fargate backend. Seems fine when using a managed mode EKS cluster (ie: using plain EC2 nodes).
What you expected to happen:
Dask Gateway deployed and reachable
Minimal Complete Verifiable Example:
Creating the cluster (all default settings):
eksctl create cluster --name ds-eks-fargate --region us-west-2 --fargate
Deploy the dask-gateway (all default settings):
helm upgrade --install --namespace default dask-gateway-fargate daskgateway/dask-gateway
Result:
❯ helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
dask-gateway-fargate default 1 2021-03-22 12:52:59.264708 +1300 NZDT deployed dask-gateway-0.9.0 0.9.0
❯ kubectl get pods
NAME READY STATUS RESTARTS AGE
api-dask-gateway-fargate-5bbdfb7799-jqh26 1/1 Running 0 5m56s
controller-dask-gateway-fargate-594b8dcc65-knjrr 1/1 Running 0 5m56s
traefik-dask-gateway-fargate-6b6cd85445-6l4jx 1/1 Running 0 5m56s
❯ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
api-dask-gateway-fargate ClusterIP 10.100.13.39 <none> 8000/TCP 6m30s
kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 3h22m
traefik-dask-gateway-fargate LoadBalancer 10.100.180.97 a910dfb4e398d4ee18003ce5bdcedddd-948617200.us-west-2.elb.amazonaws.com 80:32495/TCP 6m30s
Then in bash/zsh:
❯ telnet a910dfb4e398d4ee18003ce5bdcedddd-948617200.us-west-2.elb.amazonaws.com 80
Trying 52.38.243.146...
Connected to a910dfb4e398d4ee18003ce5bdcedddd-948617200.us-west-2.elb.amazonaws.com.
Escape character is '^]'.
Connection closed by foreign host.
❯ curl -i http://a910dfb4e398d4ee18003ce5bdcedddd-948617200.us-west-2.elb.amazonaws.com
curl: (52) Empty reply from server
Then with Python:
from dask_gateway import Gateway
gateway = Gateway(
"http://a910dfb4e398d4ee18003ce5bdcedddd-948617200.us-west-2.elb.amazonaws.com",
)
print(f"Clusters: {gateway.list_clusters()}")
Should be:
Clusters: []
But is:
Traceback (most recent call last):
File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-66bf2c44db55>", line 1, in <module>
runfile('/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/scripts/main.py', wdir='/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/scripts')
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/scripts/main.py", line 7, in <module>
print(f"Clusters: {gateway.list_clusters()}")
File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/site-packages/dask_gateway/client.py", line 455, in list_clusters
return self.sync(self._clusters, status=status, **kwargs)
File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/site-packages/dask_gateway/client.py", line 343, in sync
return future.result()
File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
raise self._exception
File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/site-packages/dask_gateway/client.py", line 434, in _clusters
resp = await self._request("GET", url)
File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/site-packages/dask_gateway/client.py", line 396, in _request
resp = await session.request(method, url, json=json, **self._request_kwargs)
File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/site-packages/aiohttp/client.py", line 544, in _request
await resp.start(conn)
File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 890, in start
message, payload = await self._protocol.read() # type: ignore
File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/site-packages/aiohttp/streams.py", line 604, in read
await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
Traefik pod logs:
❯ kubectl logs -f traefik-dask-gateway-fargate-6b6cd85445-6l4jx
time="2021-03-21T23:54:34Z" level=info msg="Configuration loaded from flags."
time="2021-03-21T23:54:35Z" level=error msg="subset not found for default/api-dask-gateway-fargate" ingress=api-dask-gateway-fargate namespace=default providerName=kubernetescrd
time="2021-03-21T23:54:37Z" level=error msg="subset not found for default/api-dask-gateway-fargate" ingress=api-dask-gateway-fargate namespace=default providerName=kubernetescrd
time="2021-03-21T23:54:39Z" level=error msg="subset not found for default/api-dask-gateway-fargate" providerName=kubernetescrd ingress=api-dask-gateway-fargate namespace=default
API pod logs:
❯ kubectl logs -f api-dask-gateway-fargate-5bbdfb7799-jqh26
[I 2021-03-21 23:54:40.908 DaskGateway] Starting dask-gateway-server - version 0.9.0
[I 2021-03-21 23:54:41.215 DaskGateway] Authenticator: 'dask_gateway_server.auth.SimpleAuthenticator'
[I 2021-03-21 23:54:41.216 DaskGateway] Backend: 'dask_gateway_server.backends.kubernetes.backend.KubeBackend'
[I 2021-03-21 23:54:41.239 DaskGateway] Dask-Gateway server started
[I 2021-03-21 23:54:41.239 DaskGateway] - Private API server listening at http://:8000
Controller pod logs:
❯ kubectl logs -f controller-dask-gateway-fargate-594b8dcc65-knjrr
[I 2021-03-21 23:54:40.723 KubeController] Starting dask-gateway-kube-controller - version 0.9.0
[I 2021-03-21 23:54:40.762 KubeController] dask-gateway-kube-controller started!
[I 2021-03-21 23:54:40.762 KubeController] API listening at http://:8000
Anything else we need to know?:
Seems fine when using an EC2 backed AWS EKS Cluster created as follows (all default settings):
eksctl create cluster --name ds-eks --region us-west-2 --with-oidc --ssh-access --ssh-public-key ds-eks-2-keypair --managed
All else the same (including the pod logs) except endpoint address:
❯ telnet a1c43d20fdace4891a6e1af97c2b4830-1751627295.us-west-2.elb.amazonaws.com 80
Trying 34.210.180.41...
Connected to a1c43d20fdace4891a6e1af97c2b4830-1751627295.us-west-2.elb.amazonaws.com.
Escape character is '^]'.
❯ curl -i http://a1c43d20fdace4891a6e1af97c2b4830-1751627295.us-west-2.elb.amazonaws.com
HTTP/1.1 404 Not Found
Content-Length: 14
Content-Type: text/plain; charset=utf-8
Date: Mon, 22 Mar 2021 00:44:12 GMT
Server: Python/3.8 aiohttp/3.7.2
from dask_gateway import Gateway
gateway = Gateway(
"http:/http://a1c43d20fdace4891a6e1af97c2b4830-1751627295.us-west-2.elb.amazonaws.com",
)
print(f"Clusters: {gateway.list_clusters()}")
>>> Clusters:[]
Environment:
- Dask version: 0.9.0
- Python version: 3.8.7
- Operating System: OSX 11.2.3
- Install method (conda, pip, source): Conda for client (above for gateway services)
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (2 by maintainers)
Top Results From Across the Web
Troubleshoot Amazon EKS pods on AWS Fargate stuck in ...
My Amazon Elastic Kubernetes Service (Amazon EKS) pods that are running on AWS Fargate instances are stuck in a Pending state.
Read more >[aws-eks] Kubernetes resources fail to create if fargate profiles ...
When trying to deploy kubectl resources (such as KubernetesResource or HelmChart) against an EKS cluster with Fargate Profiles that are ...
Read more >AWS EKS vs. ECS vs. Fargate vs. Kops - CAST AI
Deploying clusters on EKS is a bit more complex and requires expert configuration. You need to configure and deploy pods via Kubernetes first...
Read more >AWS EKS Fargate | Lacework Documentation
EKS Fargate is Kubernetes-centric. It does not have the concept of tasks like in ECS Fargate. A task is replaced by a pod...
Read more >AWS EKS Kubernetes Fargate Profiles Basics
We are going to deploy a simple NGINX App1 with Ingress Load Balancer · We cannot use Worker Node Node Ports for Fargate...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Given the similarities in all the logs it seems like the load balancer may be closing the connection, or perhaps doesn’t support something about the way the connection works.
Ok by me.