Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

FargateCluster timeout on exit

See original GitHub issue

When I close a Fargate cluster (similar to #220 ) using

client.close()
cluster.close()

I receive the following error

Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/deploy/adaptive_core.py", line 190, in adapt
    target = await self.safe_target()
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/deploy/adaptive_core.py", line 128, in safe_target
    n = await self.target()
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/deploy/adaptive.py", line 146, in target
    return await self.scheduler.adaptive_target(
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/core.py", line 789, in send_recv_from_rpc
    comm = await self.live_comm()
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/core.py", line 747, in live_comm
    comm = await connect(
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/comm/core.py", line 307, in connect
    raise IOError(
OSError: Timed out trying to connect to tcp://3.87.54.191:8786 after 10 s

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/deploy/adaptive_core.py", line 204, in adapt
    if status != "down":
UnboundLocalError: local variable 'status' referenced before assignment
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <zmq.eventloop.ioloop.ZMQIOLoop object at 0x7fbf8cc7f640>>, <Task finished name='Task-16550' coro=<AdaptiveCore.adapt() done, defined at /home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/deploy/adaptive_core.py:178> exception=UnboundLocalError("local variable 'status' referenced before assignment")>)
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/site-packages/distributed/comm/core.py", line 285, in connect
    comm = await asyncio.wait_for(
  File "/home/ec2-user/anaconda3/envs/features_r/lib/python3.8/asyncio/tasks.py", line 490, in wait_for
    raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:7 (2 by maintainers)

Top GitHub Comments

1reaction

mkarbocommented, Jan 19, 2022

https://github.com/PrefectHQ/prefect/issues/5330 - here is a detail of issue which leads to IOLoop also, using prefect and fargate cluster.

Sorry to tag, just wondering if any updates regarding this potential issue in distributed is getting any attention? @jacobtomlinson

1reaction

mkarbocommented, Jan 18, 2022

I am also having this issue, I have tested with 2021.12.0 and 2022.01.1 and python3.7/8/9.

However, worth noting, I got the same raises through a wrapper (Prefect) using dask-cloudproviders (fargatecluster).

My code runs, results complete, then upon exit of process the IOLoop is closed runtime error is raised in scheduler from utils.py

Top Results From Across the Web

FargateCluster timeout on exit · Issue #5447 · dask/distributed

When I close a Fargate cluster (similar to #220 ) using client.close() cluster.close() I receive the following error Traceback (most recent ...

FargateCluster - AWS Documentation - Amazon.com

Defines an EKS cluster that runs entirely on AWS Fargate. The cluster is created with a default Fargate Profile that matches the “default”...

Dask Cloud Provider Environment - Prefect Docs

from dask_cloudprovider import FargateCluster from prefect import Flow, ... For development, you may want to increase this timeout. @task def times_two(x): ...

uvicorn shutting down after 1-2 minutes on AWS Fargate

gunicorn \ --log-config 'logging.conf' --timeout 6000 ... CMD-SHELL curl -f http://0.0.0.0:8000 || exit 1. I always thought it was the other ...

Traefik + ECS Fargate | Gateway timeout error : r/aws - Reddit

I've been trying to deploy multiple services behind a Traefik proxy inside the ECS Fargate cluster. But Traefik is not been able to...