New Behavior: FargateCluster dies with "Scheduler exited unexpectedly"
See original GitHub issuePinning aiobotocore=0.12.0
got me going again (thanks to https://github.com/dask/dask-cloudprovider/issues/78#issuecomment-610018083), and I’m running my notebook that ran before, but now when I try to start a FargateCluster:
cluster = FargateCluster(n_workers=1, image='rsignell/fargate-worker:2020-04-07')
a few minutes go by and then I get “Scheduler exited unexpectedly”:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-3-e2ac6335fc11> in <module>
----> 1 cluster = FargateCluster(n_workers=1, image='rsignell/fargate-worker:2020-04-07')
~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in __init__(self, **kwargs)
1099
1100 def __init__(self, **kwargs):
-> 1101 super().__init__(fargate_scheduler=True, fargate_workers=True, **kwargs)
1102
1103
~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in __init__(self, fargate_scheduler, fargate_workers, image, scheduler_cpu, scheduler_mem, scheduler_timeout, worker_cpu, worker_mem, worker_gpu, n_workers, cluster_arn, cluster_name_template, execution_role_arn, task_role_arn, task_role_policies, cloudwatch_logs_group, cloudwatch_logs_stream_prefix, cloudwatch_logs_default_retention, vpc, subnets, security_groups, environment, tags, find_address_timeout, skip_cleanup, aws_access_key_id, aws_secret_access_key, region_name, **kwargs)
593 self._region_name = region_name
594 self._lock = asyncio.Lock()
--> 595 super().__init__(**kwargs)
596
597 async def _start(self,):
~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/distributed/deploy/spec.py in __init__(self, workers, scheduler, worker, asynchronous, loop, security, silence_logs, name)
254 if not self.asynchronous:
255 self._loop_runner.start()
--> 256 self.sync(self._start)
257 self.sync(self._correct_state)
258
~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/distributed/deploy/cluster.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
159 return future
160 else:
--> 161 return sync(self.loop, func, *args, **kwargs)
162
163 async def _get_logs(self, scheduler=True, workers=True):
~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
346 if error[0]:
347 typ, exc, tb = error[0]
--> 348 raise exc.with_traceback(tb)
349 else:
350 return result[0]
~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/distributed/utils.py in f()
330 if callback_timeout is not None:
331 future = asyncio.wait_for(future, callback_timeout)
--> 332 result[0] = yield future
333 except Exception as exc:
334 error[0] = sys.exc_info()
~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/tornado/gen.py in run(self)
733
734 try:
--> 735 value = future.result()
736 except Exception:
737 exc_info = sys.exc_info()
~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in _start(self)
765 "Hang tight! ",
766 ):
--> 767 await super()._start()
768
769 @property
~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/distributed/deploy/spec.py in _start(self)
282
283 self.status = "starting"
--> 284 self.scheduler = await self.scheduler
285 self.scheduler_comm = rpc(
286 getattr(self.scheduler, "external_address", None) or self.scheduler.address,
~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in _()
128 async with self.lock:
129 if not self.task:
--> 130 await self.start()
131 assert self.task
132 return self
~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in start(self)
258 self.public_ip = interface["Association"]["PublicIp"]
259 self.private_ip = interface["PrivateIpAddresses"][0]["PrivateIpAddress"]
--> 260 await self._set_address_from_logs()
261 self.status = "running"
262
~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in _set_address_from_logs(self)
181 else:
182 if not await self._task_is_running():
--> 183 raise RuntimeError("%s exited unexpectedly!" % type(self).__name__)
184 continue
185 break
RuntimeError: Scheduler exited unexpectedly!
Here’s my environment
# This file may be used to create an environment using: # $ conda create --name <env> --file <this file> # platform: linux-64 _libgcc_mutex=0.1=conda_forge _openmp_mutex=4.5=1_llvm aiobotocore=0.12.0=py_0 aiohttp=3.6.2=py36h516909a_0 aioitertools=0.5.1=py_0 appdirs=1.4.3=py_1 asciitree=0.3.3=py_2 async-timeout=3.0.1=py_1000 attrs=19.3.0=py_0 backcall=0.1.0=py_0 bleach=3.1.4=pyh9f0ad1d_0 blinker=1.4=py_1 blosc=1.17.1=he1b5a44_0 bokeh=2.0.1=py36h9f0ad1d_0 boost-cpp=1.72.0=h8e57a91_0 boto3=1.12.15=py_0 botocore=1.15.15=py_0 bzip2=1.0.8=h516909a_2 ca-certificates=2020.4.5.1=hecc5488_0 cachetools=3.1.1=py_0 cairo=1.16.0=hcf35c78_1003 cartopy=0.17.0=py36hb2212bf_1013 certifi=2020.4.5.1=py36h9f0ad1d_0 cffi=1.14.0=py36hd463f26_0 cfitsio=3.470=hb60a0a2_2 cftime=1.1.1.2=py36h785e9b2_0 chardet=3.0.4=py36h9f0ad1d_1006 click=7.1.1=pyh8c360ce_0 click-plugins=1.1.1=py_0 cligj=0.5.0=py_0 cloudpickle=1.3.0=py_0 colorcet=2.0.1=py_0 cryptography=2.8=py36h45558ae_2 curl=7.68.0=hf8cf82a_0 cycler=0.10.0=py_2 cython=0.29.16=py36h831f99a_0 cytoolz=0.10.1=py36h516909a_0 dask=2.14.0=py_0 dask-cloudprovider=0.1.1=py_0 dask-core=2.14.0=py_0 datashader=0.10.0=py_0 datashape=0.5.4=py_1 decorator=4.4.2=py_0 defusedxml=0.6.0=py_0 distributed=2.14.0=py36h9f0ad1d_0 docopt=0.6.2=py_1 docutils=0.15.2=py36_0 earthsim=1.0.1=py_0 entrypoints=0.3=py36h9f0ad1d_1001 erddapy=0.5.3=py_0 expat=2.2.9=he1b5a44_2 fasteners=0.14.1=py_3 fiona=1.8.13=py36h900e953_0 fontconfig=2.13.1=h86ecdb6_1001 freetype=2.10.1=he06d7ca_0 freexl=1.0.5=h14c3975_1002 fsspec=0.7.1=py_0 gcsfs=0.6.1=py_0 gdal=3.0.4=py36hd60729c_3 geopandas=0.7.0=py_1 geos=3.8.1=he1b5a44_0 geotiff=1.5.1=hcbe54f9_9 geoviews=1.8.1=py_0 geoviews-core=1.8.1=py_0 gettext=0.19.8.1=hc5be6a0_1002 giflib=5.2.1=h516909a_2 glib=2.58.3=py36hd3ed26a_1003 google-auth=1.12.0=pyh9f0ad1d_0 google-auth-oauthlib=0.4.1=py_2 h5netcdf=0.8.0=py_0 h5py=2.10.0=nompi_py36h513d04c_102 hdf4=4.2.13=hf30be14_1003 hdf5=1.10.5=nompi_h3c11f04_1104 heapdict=1.0.1=py_0 holoviews=1.13.2=pyh9f0ad1d_0 hvplot=0.5.2=py_0 icu=64.2=he1b5a44_1 idna=2.9=py_1 idna_ssl=1.1.0=py36_1000 imageio=2.8.0=py_0 importlib-metadata=1.6.0=py36h9f0ad1d_0 importlib_metadata=1.6.0=0 ipykernel=5.2.0=py36h95af2a2_1 ipython=7.13.0=py36h9f0ad1d_2 ipython_genutils=0.2.0=py_1 ipywidgets=7.5.1=py_0 jedi=0.16.0=py36h9f0ad1d_1 jinja2=2.11.1=py_0 jmespath=0.9.5=py_0 jpeg=9c=h14c3975_1001 json-c=0.13.1=h14c3975_1001 jsonschema=3.2.0=py36h9f0ad1d_1 jupyter_client=6.1.2=py_0 jupyter_core=4.6.3=py36h9f0ad1d_1 kealib=1.4.13=hec59c27_0 kiwisolver=1.2.0=py36hdb11119_0 krb5=1.16.4=h2fd8d38_0 ld_impl_linux-64=2.34=h53a641e_0 libblas=3.8.0=16_openblas libcblas=3.8.0=16_openblas libcurl=7.68.0=hda55be3_0 libdap4=3.20.4=hd3bb157_0 libedit=3.1.20170329=hf8c457e_1001 libffi=3.2.1=he1b5a44_1007 libgcc-ng=9.2.0=h24d8f2e_2 libgdal=3.0.4=h94bbfbd_3 libgfortran-ng=7.3.0=hdf63c60_5 libiconv=1.15=h516909a_1006 libkml=1.3.0=hb574062_1011 liblapack=3.8.0=16_openblas libllvm8=8.0.1=hc9558a2_0 libnetcdf=4.7.4=nompi_h9f9fd6a_101 libopenblas=0.3.9=h5ec1e0e_0 libpng=1.6.37=hed695b0_1 libpq=12.2=hae5116b_0 libsodium=1.0.17=h516909a_0 libspatialindex=1.9.3=he1b5a44_3 libspatialite=4.3.0a=heb269f5_1037 libssh2=1.8.2=h22169c7_2 libstdcxx-ng=9.2.0=hdf63c60_2 libtiff=4.1.0=hc3755c2_3 libuuid=2.32.1=h14c3975_1000 libwebp=1.0.2=h56121f0_5 libxcb=1.13=h14c3975_1002 libxml2=2.9.10=hee79883_0 llvm-openmp=9.0.1=hc9558a2_2 llvmlite=0.31.0=py36hfa65bc7_1 locket=0.2.0=py_2 lz4=3.0.2=py36h964dcb7_1 lz4-c=1.8.3=he1b5a44_1001 markdown=3.2.1=py_0 markupsafe=1.1.1=py36h8c4c3a4_1 matplotlib-base=3.2.1=py36hb8e4980_0 metpy=0.12.0=py_0 mistune=0.8.4=py36h516909a_1000 monotonic=1.5=py_0 msgpack-python=1.0.0=py36hdb11119_1 multidict=4.7.5=py36h516909a_0 multipledispatch=0.6.0=py_0 munch=2.5.0=py_0 nbconvert=5.6.1=py36_0 nbformat=5.0.4=py_0 ncurses=6.1=hf484d3e_1002 netcdf4=1.5.3=nompi_py36h90ce072_103 networkx=2.4=py_1 notebook=6.0.3=py36_0 numba=0.48.0=py36hb3f55d8_0 numcodecs=0.6.4=py36he1b5a44_0 numpy=1.18.1=py36h7314795_1 oauthlib=3.0.1=py_0 olefile=0.46=py_0 openjpeg=2.3.1=h981e76c_3 openssl=1.1.1f=h516909a_0 owslib=0.19.2=py_1 packaging=20.1=py_0 pandas=1.0.3=py36h830a2c2_0 pandoc=2.9.2=0 pandocfilters=1.4.2=py_1 panel=0.9.5=py_0 param=1.9.3=py_0 parso=0.6.2=py_0 partd=1.1.0=py_0 pcre=8.44=he1b5a44_0 pexpect=4.8.0=py36h9f0ad1d_1 pickleshare=0.7.5=py36h9f0ad1d_1001 pillow=7.1.1=py36h8328e55_0 pint=0.11=py_1 pip=20.0.2=py_2 pixman=0.38.0=h516909a_1003 pooch=1.0.0=py_0 poppler=0.67.0=h14e79db_8 poppler-data=0.4.9=1 postgresql=12.2=hf1211e9_0 proj=6.3.1=hc80f0dc_1 prometheus_client=0.7.1=py_0 prompt-toolkit=3.0.5=py_0 psutil=5.7.0=py36h8c4c3a4_1 pthread-stubs=0.4=h14c3975_1001 ptyprocess=0.6.0=py_1001 pyasn1=0.4.8=py_0 pyasn1-modules=0.2.7=py_0 pycparser=2.20=py_0 pyct=0.4.6=py_0 pyct-core=0.4.6=py_0 pyepsg=0.4.0=py_0 pygments=2.6.1=py_0 pyjwt=1.7.1=py_0 pykdtree=1.3.1=py36h785e9b2_1003 pyopenssl=19.1.0=py_1 pyparsing=2.4.7=pyh9f0ad1d_0 pyproj=2.6.0=py36h47ab0c1_0 pyrsistent=0.16.0=py36h8c4c3a4_0 pyshp=2.1.0=py_0 pysocks=1.7.1=py36h9f0ad1d_1 python=3.6.10=h8356626_1010_cpython python-blosc=1.9.0=py36h830a2c2_0 python-dateutil=2.8.1=py_0 python-gist=0.9.2=pyh9f0ad1d_1 python-gnupg=0.4.5=py_0 python_abi=3.6=1_cp36m pytz=2019.3=py_0 pyviz_comms=0.7.4=pyh8c360ce_0 pywavelets=1.1.1=py36hc1659b7_0 pyyaml=5.3.1=py36h8c4c3a4_0 pyzmq=19.0.0=py36h9947dbf_1 readline=8.0=hf8c457e_0 requests=2.23.0=pyh8c360ce_2 requests-oauthlib=1.2.0=py_0 rsa=4.0=py_0 rtree=0.9.4=py36he053a7a_1 s3fs=0.4.2=py_0 s3transfer=0.3.3=py36h9f0ad1d_1 scikit-image=0.16.2=py36hb3f55d8_0 scipy=1.4.1=py36h2d22cac_2 send2trash=1.5.0=py_0 setuptools=46.1.3=py36h9f0ad1d_0 shapely=1.7.0=py36h3d6ee9d_3 simplejson=3.17.0=py36h516909a_0 six=1.14.0=py_1 sortedcontainers=2.1.0=py_0 sqlite=3.30.1=hcee41ef_0 stglib=0.2.0=py_0 tbb=2018.0.5=h2d50403_0 tblib=1.6.0=py_0 terminado=0.8.3=py36h9f0ad1d_1 testpath=0.4.4=py_0 tiledb=1.7.0=hcde45ca_2 tk=8.6.10=hed695b0_0 toolz=0.10.0=py_0 tornado=6.0.4=py36h8c4c3a4_1 tqdm=4.45.0=pyh9f0ad1d_0 traitlets=4.3.3=py36h9f0ad1d_1 typing_extensions=3.7.4.1=py36h9f0ad1d_3 tzcode=2019a=h516909a_1002 urllib3=1.25.7=py36h9f0ad1d_1 utide=0.2.5=py_0 wcwidth=0.1.9=pyh9f0ad1d_0 webencodings=0.5.1=py_1 wheel=0.34.2=py_1 widgetsnbextension=3.5.1=py36_0 wrapt=1.12.1=py36h8c4c3a4_1 xarray=0.15.1=py_0 xerces-c=3.2.2=h8412b87_1004 xmltodict=0.12.0=py_0 xorg-kbproto=1.0.7=h14c3975_1002 xorg-libice=1.0.10=h516909a_0 xorg-libsm=1.2.3=h84519dc_1000 xorg-libx11=1.6.9=h516909a_0 xorg-libxau=1.0.9=h14c3975_0 xorg-libxdmcp=1.1.3=h516909a_0 xorg-libxext=1.3.4=h516909a_0 xorg-libxrender=0.9.10=h516909a_1002 xorg-renderproto=0.11.1=h14c3975_1002 xorg-xextproto=7.3.0=h14c3975_1002 xorg-xproto=7.0.31=h14c3975_1007 xz=5.2.4=h516909a_1002 yaml=0.2.2=h516909a_1 yarl=1.3.0=py36h516909a_1000 zarr=2.4.0=py_0 zeromq=4.3.2=he1b5a44_2 zict=2.0.0=py_0 zipp=3.1.0=py_0 zlib=1.2.11=h516909a_1006 zstd=1.4.4=h3b9ef0a_2
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Why did my worker die? - Dask.distributed
Workers may exit in normal functioning because they have been asked to, e.g., they received a keyboard interrupt (^C), or the scheduler scaled...
Read more >metaflow_org/community - Gitter
I find myself having trouble running a flow on AWS Batch that uses a container with pre-installed Python libraries. I happen to be...
Read more >CHANGELOG.md · 15-5-stable · GitLab.org / gitlab-runner
New features · Improve documentation about installing and using Podman as a Docker executor replacement ! · Add support SELinux type label setting ......
Read more >@aws-cdk/assert | Yarn - Package Manager
StackProps) { super(scope, id, props); const queue = new sqs. ... based on heuristics, which caused some unexpected behavior in certain scenarios.
Read more >awsecs - Go Packages
Only used if the ASG has scheduled actions (which may scale your ASG up // or ... the container is forcefully killed if...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Grrr… It turns out the reason my docker image wasn’t working was because I didn’t make the
prepare.sh
script referenced in theDockerfile
executable. I found this out when I tried (and failed) to login to the container to explore the conda environment. 😞It now works nicely: https://nbviewer.jupyter.org/gist/rsignell-usgs/b23a082ab85f84339ad08a1a146c056d
Woot! Sorry that you had a frustrating experience, but I’m glad to hear that this was not Dask’s fault 😃
On Wed, Apr 15, 2020 at 10:34 AM Rich Signell notifications@github.com wrote: