question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Deployment on Kubernetes backed by AWS EKS on Fargate fails

See original GitHub issue

What happened:

Can’t reach the API when deploying on AWS EKS with a Fargate backend. Seems fine when using a managed mode EKS cluster (ie: using plain EC2 nodes).

What you expected to happen:

Dask Gateway deployed and reachable

Minimal Complete Verifiable Example:

Creating the cluster (all default settings):

eksctl create cluster --name ds-eks-fargate --region us-west-2 --fargate

Deploy the dask-gateway (all default settings):

helm upgrade --install --namespace default dask-gateway-fargate daskgateway/dask-gateway

Result:

❯ helm list
NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
dask-gateway-fargate    default         1               2021-03-22 12:52:59.264708 +1300 NZDT   deployed        dask-gateway-0.9.0      0.9.0      

❯ kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
api-dask-gateway-fargate-5bbdfb7799-jqh26          1/1     Running   0          5m56s
controller-dask-gateway-fargate-594b8dcc65-knjrr   1/1     Running   0          5m56s
traefik-dask-gateway-fargate-6b6cd85445-6l4jx      1/1     Running   0          5m56s

❯ kubectl get services
NAME                           TYPE           CLUSTER-IP      EXTERNAL-IP                                                              PORT(S)        AGE
api-dask-gateway-fargate       ClusterIP      10.100.13.39    <none>                                                                   8000/TCP       6m30s
kubernetes                     ClusterIP      10.100.0.1      <none>                                                                   443/TCP        3h22m
traefik-dask-gateway-fargate   LoadBalancer   10.100.180.97   a910dfb4e398d4ee18003ce5bdcedddd-948617200.us-west-2.elb.amazonaws.com   80:32495/TCP   6m30s

Then in bash/zsh:

❯ telnet a910dfb4e398d4ee18003ce5bdcedddd-948617200.us-west-2.elb.amazonaws.com 80
Trying 52.38.243.146...
Connected to a910dfb4e398d4ee18003ce5bdcedddd-948617200.us-west-2.elb.amazonaws.com.
Escape character is '^]'.
Connection closed by foreign host.

❯ curl -i http://a910dfb4e398d4ee18003ce5bdcedddd-948617200.us-west-2.elb.amazonaws.com
curl: (52) Empty reply from server

Then with Python:

from dask_gateway import Gateway

gateway = Gateway(
    "http://a910dfb4e398d4ee18003ce5bdcedddd-948617200.us-west-2.elb.amazonaws.com",
)

print(f"Clusters: {gateway.list_clusters()}")

Should be:

Clusters: []

But is:

Traceback (most recent call last):
  File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-66bf2c44db55>", line 1, in <module>
    runfile('/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/scripts/main.py', wdir='/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/scripts')
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/scripts/main.py", line 7, in <module>
    print(f"Clusters: {gateway.list_clusters()}")
  File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/site-packages/dask_gateway/client.py", line 455, in list_clusters
    return self.sync(self._clusters, status=status, **kwargs)
  File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/site-packages/dask_gateway/client.py", line 343, in sync
    return future.result()
  File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/site-packages/dask_gateway/client.py", line 434, in _clusters
    resp = await self._request("GET", url)
  File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/site-packages/dask_gateway/client.py", line 396, in _request
    resp = await session.request(method, url, json=json, **self._request_kwargs)
  File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/site-packages/aiohttp/client.py", line 544, in _request
    await resp.start(conn)
  File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 890, in start
    message, payload = await self._protocol.read()  # type: ignore
  File "/Users/joncourt/Source/data-science/datascience-forecasting-experimentation/.env/lib/python3.8/site-packages/aiohttp/streams.py", line 604, in read
    await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

Traefik pod logs:

❯ kubectl logs -f traefik-dask-gateway-fargate-6b6cd85445-6l4jx
time="2021-03-21T23:54:34Z" level=info msg="Configuration loaded from flags."
time="2021-03-21T23:54:35Z" level=error msg="subset not found for default/api-dask-gateway-fargate" ingress=api-dask-gateway-fargate namespace=default providerName=kubernetescrd
time="2021-03-21T23:54:37Z" level=error msg="subset not found for default/api-dask-gateway-fargate" ingress=api-dask-gateway-fargate namespace=default providerName=kubernetescrd
time="2021-03-21T23:54:39Z" level=error msg="subset not found for default/api-dask-gateway-fargate" providerName=kubernetescrd ingress=api-dask-gateway-fargate namespace=default

API pod logs:

❯ kubectl logs -f api-dask-gateway-fargate-5bbdfb7799-jqh26
[I 2021-03-21 23:54:40.908 DaskGateway] Starting dask-gateway-server - version 0.9.0
[I 2021-03-21 23:54:41.215 DaskGateway] Authenticator: 'dask_gateway_server.auth.SimpleAuthenticator'
[I 2021-03-21 23:54:41.216 DaskGateway] Backend: 'dask_gateway_server.backends.kubernetes.backend.KubeBackend'
[I 2021-03-21 23:54:41.239 DaskGateway] Dask-Gateway server started
[I 2021-03-21 23:54:41.239 DaskGateway] - Private API server listening at http://:8000

Controller pod logs:

❯ kubectl logs -f controller-dask-gateway-fargate-594b8dcc65-knjrr
[I 2021-03-21 23:54:40.723 KubeController] Starting dask-gateway-kube-controller - version 0.9.0
[I 2021-03-21 23:54:40.762 KubeController] dask-gateway-kube-controller started!
[I 2021-03-21 23:54:40.762 KubeController] API listening at http://:8000

Anything else we need to know?:

Seems fine when using an EC2 backed AWS EKS Cluster created as follows (all default settings):

eksctl create cluster --name ds-eks --region us-west-2 --with-oidc --ssh-access --ssh-public-key ds-eks-2-keypair --managed

All else the same (including the pod logs) except endpoint address:

❯ telnet a1c43d20fdace4891a6e1af97c2b4830-1751627295.us-west-2.elb.amazonaws.com 80
Trying 34.210.180.41...
Connected to a1c43d20fdace4891a6e1af97c2b4830-1751627295.us-west-2.elb.amazonaws.com.
Escape character is '^]'.

❯ curl -i http://a1c43d20fdace4891a6e1af97c2b4830-1751627295.us-west-2.elb.amazonaws.com
HTTP/1.1 404 Not Found
Content-Length: 14
Content-Type: text/plain; charset=utf-8
Date: Mon, 22 Mar 2021 00:44:12 GMT
Server: Python/3.8 aiohttp/3.7.2
from dask_gateway import Gateway

gateway = Gateway(
    "http:/http://a1c43d20fdace4891a6e1af97c2b4830-1751627295.us-west-2.elb.amazonaws.com",
)

print(f"Clusters: {gateway.list_clusters()}")

>>> Clusters:[]

Environment:

  • Dask version: 0.9.0
  • Python version: 3.8.7
  • Operating System: OSX 11.2.3
  • Install method (conda, pip, source): Conda for client (above for gateway services)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
jacobtomlinsoncommented, Mar 22, 2021

Given the similarities in all the logs it seems like the load balancer may be closing the connection, or perhaps doesn’t support something about the way the connection works.

0reactions
joncourtcommented, Aug 27, 2021

Ok by me.

On 27/08/2021, at 12:24 PM, Erik Sundell @.***> wrote:

I think this is an issue that can’t be translated to a concrete action point for something to be done to the codebase in this repo. Due to that, I’d like to close this issue. Is that alright?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot Amazon EKS pods on AWS Fargate stuck in ...
My Amazon Elastic Kubernetes Service (Amazon EKS) pods that are running on AWS Fargate instances are stuck in a Pending state.
Read more >
[aws-eks] Kubernetes resources fail to create if fargate profiles ...
When trying to deploy kubectl resources (such as KubernetesResource or HelmChart) against an EKS cluster with Fargate Profiles that are ...
Read more >
AWS EKS vs. ECS vs. Fargate vs. Kops - CAST AI
Deploying clusters on EKS is a bit more complex and requires expert configuration. You need to configure and deploy pods via Kubernetes first...
Read more >
AWS EKS Fargate | Lacework Documentation
EKS Fargate is Kubernetes-centric. It does not have the concept of tasks like in ECS Fargate. A task is replaced by a pod...
Read more >
AWS EKS Kubernetes Fargate Profiles Basics
We are going to deploy a simple NGINX App1 with Ingress Load Balancer · We cannot use Worker Node Node Ports for Fargate...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found