question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

FargateCluster timeout on exit

See original GitHub issue

When I close a Fargate cluster (similar to #220 ) using

client.close()
cluster.close()

I receive the following error

Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/deploy/adaptive_core.py", line 190, in adapt
    target = await self.safe_target()
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/deploy/adaptive_core.py", line 128, in safe_target
    n = await self.target()
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/deploy/adaptive.py", line 146, in target
    return await self.scheduler.adaptive_target(
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/core.py", line 789, in send_recv_from_rpc
    comm = await self.live_comm()
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/core.py", line 747, in live_comm
    comm = await connect(
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/comm/core.py", line 307, in connect
    raise IOError(
OSError: Timed out trying to connect to tcp://3.87.54.191:8786 after 10 s

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/deploy/adaptive_core.py", line 204, in adapt
    if status != "down":
UnboundLocalError: local variable 'status' referenced before assignment
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <zmq.eventloop.ioloop.ZMQIOLoop object at 0x7fbf8cc7f640>>, <Task finished name='Task-16550' coro=<AdaptiveCore.adapt() done, defined at /home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/deploy/adaptive_core.py:178> exception=UnboundLocalError("local variable 'status' referenced before assignment")>)
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/comm/core.py", line 285, in connect
    comm = await asyncio.wait_for(
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/asyncio/tasks.py", line 490, in wait_for
    raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
mkarbocommented, Jan 19, 2022

https://github.com/PrefectHQ/prefect/issues/5330 - here is a detail of issue which leads to IOLoop also, using prefect and fargate cluster.

Sorry to tag, just wondering if any updates regarding this potential issue in distributed is getting any attention? @jacobtomlinson

1reaction
mkarbocommented, Jan 18, 2022

I am also having this issue, I have tested with 2021.12.0 and 2022.01.1 and python3.7/8/9.

However, worth noting, I got the same raises through a wrapper (Prefect) using dask-cloudproviders (fargatecluster).

My code runs, results complete, then upon exit of process the IOLoop is closed runtime error is raised in scheduler from utils.py

Read more comments on GitHub >

github_iconTop Results From Across the Web

FargateCluster timeout on exit · Issue #5447 · dask/distributed
When I close a Fargate cluster (similar to #220 ) using client.close() cluster.close() I receive the following error Traceback (most recent ...
Read more >
FargateCluster - AWS Documentation - Amazon.com
Defines an EKS cluster that runs entirely on AWS Fargate. The cluster is created with a default Fargate Profile that matches the “default”...
Read more >
Dask Cloud Provider Environment - Prefect Docs
from dask_cloudprovider import FargateCluster from prefect import Flow, ... For development, you may want to increase this timeout. @task def times_two(x): ...
Read more >
uvicorn shutting down after 1-2 minutes on AWS Fargate
gunicorn \ --log-config 'logging.conf' --timeout 6000 ... CMD-SHELL curl -f http://0.0.0.0:8000 || exit 1. I always thought it was the other ...
Read more >
Traefik + ECS Fargate | Gateway timeout error : r/aws - Reddit
I've been trying to deploy multiple services behind a Traefik proxy inside the ECS Fargate cluster. But Traefik is not been able to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found