FargateCluster container: scheduler exited unexpectedly!
See original GitHub issueThis works fine (takes about 2 minutes):
from dask_cloudprovider import FargateCluster
cluster = FargateCluster(n_workers=1, image='rsignell/pangeo-worker:2020-01-23c')
but then when I added numba
, holoviews
and datashader
to the container environment and tried again:
from dask_cloudprovider import FargateCluster
cluster = FargateCluster(n_workers=1, image='rsignell/pangeo-worker:2020-01-28')
I get:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<timed exec> in <module>
~/SageMaker/myenvs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in __init__(self, **kwargs)
1099
1100 def __init__(self, **kwargs):
-> 1101 super().__init__(fargate_scheduler=True, fargate_workers=True, **kwargs)
1102
1103
~/SageMaker/myenvs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in __init__(self, fargate_scheduler, fargate_workers, image, scheduler_cpu, scheduler_mem, scheduler_timeout, worker_cpu, worker_mem, worker_gpu, n_workers, cluster_arn, cluster_name_template, execution_role_arn, task_role_arn, task_role_policies, cloudwatch_logs_group, cloudwatch_logs_stream_prefix, cloudwatch_logs_default_retention, vpc, subnets, security_groups, environment, tags, find_address_timeout, skip_cleanup, aws_access_key_id, aws_secret_access_key, region_name, **kwargs)
593 self._region_name = region_name
594 self._lock = asyncio.Lock()
--> 595 super().__init__(**kwargs)
596
597 async def _start(self,):
~/SageMaker/myenvs/pangeo/lib/python3.6/site-packages/distributed/deploy/spec.py in __init__(self, workers, scheduler, worker, asynchronous, loop, security, silence_logs, name)
254 if not self.asynchronous:
255 self._loop_runner.start()
--> 256 self.sync(self._start)
257 self.sync(self._correct_state)
258
~/SageMaker/myenvs/pangeo/lib/python3.6/site-packages/distributed/deploy/cluster.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
160 return future
161 else:
--> 162 return sync(self.loop, func, *args, **kwargs)
163
164 async def _logs(self, scheduler=True, workers=True):
~/SageMaker/myenvs/pangeo/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
343 if error[0]:
344 typ, exc, tb = error[0]
--> 345 raise exc.with_traceback(tb)
346 else:
347 return result[0]
~/SageMaker/myenvs/pangeo/lib/python3.6/site-packages/distributed/utils.py in f()
327 if callback_timeout is not None:
328 future = gen.with_timeout(timedelta(seconds=callback_timeout), future)
--> 329 result[0] = yield future
330 except Exception as exc:
331 error[0] = sys.exc_info()
~/SageMaker/myenvs/pangeo/lib/python3.6/site-packages/tornado/gen.py in run(self)
733
734 try:
--> 735 value = future.result()
736 except Exception:
737 exc_info = sys.exc_info()
~/SageMaker/myenvs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in _start(self)
765 "Hang tight! ",
766 ):
--> 767 await super()._start()
768
769 @property
~/SageMaker/myenvs/pangeo/lib/python3.6/site-packages/distributed/deploy/spec.py in _start(self)
282
283 self.status = "starting"
--> 284 self.scheduler = await self.scheduler
285 self.scheduler_comm = rpc(
286 getattr(self.scheduler, "external_address", None) or self.scheduler.address,
~/SageMaker/myenvs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in _()
128 async with self.lock:
129 if not self.task:
--> 130 await self.start()
131 assert self.task
132 return self
~/SageMaker/myenvs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in start(self)
258 self.public_ip = interface["Association"]["PublicIp"]
259 self.private_ip = interface["PrivateIpAddresses"][0]["PrivateIpAddress"]
--> 260 await self._set_address_from_logs()
261 self.status = "running"
262
~/SageMaker/myenvs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in _set_address_from_logs(self)
181 else:
182 if not await self._task_is_running():
--> 183 raise RuntimeError("%s exited unexpectedly!" % type(self).__name__)
184 continue
185 break
RuntimeError: Scheduler exited unexpectedly!
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Troubleshoot ECS tasks stopping or failing to start
My Amazon Elastic Container Service (Amazon ECS) container exits unexpectedly, and tasks stop or fail to start. How can resolve the issue?
Read more >dask_cloudprovider.aws.ecs - Dask Cloud Provider
scheduler_timeout : str (optional) The scheduler task will exit after this amount of time if there are no clients connected. Defaults to ``5...
Read more >AWS ECS error: Task failed ELB health checks in Target group
as it is fargate cluster, I am not getting how to login to container and execute some health check queries to debug further....
Read more >How to create an AWS Fargate cluster - YouTube
Full Azure DevOps on AWS tutorial https://www.youtube.com/playlist?list=PLwpoxH5mFQS11T7ez_C_Iib8Hk_r7T-AD AWS Fargate cluster provides ...
Read more >Diving into Amazon ECS task history with Container Insights
If your application crashes unexpectedly, or the instance hosting an application goes down, then ECS can restart your application automatically ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@jacobtomlinson , yep, the logs showed me the problem, which was a docker container error. In my description of the problem, I said I just added a few packages to a dockerfile that worked, and then it didn’t work. But when I added the packages, I added them to the wrong dockerfile. So just user error. Nothing to do with dask-cloudprovider.
Sure that sounds reasonable!