question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

New Behavior: FargateCluster dies with "Scheduler exited unexpectedly"

See original GitHub issue

Pinning aiobotocore=0.12.0 got me going again (thanks to https://github.com/dask/dask-cloudprovider/issues/78#issuecomment-610018083), and I’m running my notebook that ran before, but now when I try to start a FargateCluster:

cluster = FargateCluster(n_workers=1, image='rsignell/fargate-worker:2020-04-07')

a few minutes go by and then I get “Scheduler exited unexpectedly”:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-e2ac6335fc11> in <module>
----> 1 cluster = FargateCluster(n_workers=1, image='rsignell/fargate-worker:2020-04-07')

~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in __init__(self, **kwargs)
   1099 
   1100     def __init__(self, **kwargs):
-> 1101         super().__init__(fargate_scheduler=True, fargate_workers=True, **kwargs)
   1102 
   1103 

~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in __init__(self, fargate_scheduler, fargate_workers, image, scheduler_cpu, scheduler_mem, scheduler_timeout, worker_cpu, worker_mem, worker_gpu, n_workers, cluster_arn, cluster_name_template, execution_role_arn, task_role_arn, task_role_policies, cloudwatch_logs_group, cloudwatch_logs_stream_prefix, cloudwatch_logs_default_retention, vpc, subnets, security_groups, environment, tags, find_address_timeout, skip_cleanup, aws_access_key_id, aws_secret_access_key, region_name, **kwargs)
    593         self._region_name = region_name
    594         self._lock = asyncio.Lock()
--> 595         super().__init__(**kwargs)
    596 
    597     async def _start(self,):

~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/distributed/deploy/spec.py in __init__(self, workers, scheduler, worker, asynchronous, loop, security, silence_logs, name)
    254         if not self.asynchronous:
    255             self._loop_runner.start()
--> 256             self.sync(self._start)
    257             self.sync(self._correct_state)
    258 

~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/distributed/deploy/cluster.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    159             return future
    160         else:
--> 161             return sync(self.loop, func, *args, **kwargs)
    162 
    163     async def _get_logs(self, scheduler=True, workers=True):

~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    346     if error[0]:
    347         typ, exc, tb = error[0]
--> 348         raise exc.with_traceback(tb)
    349     else:
    350         return result[0]

~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/distributed/utils.py in f()
    330             if callback_timeout is not None:
    331                 future = asyncio.wait_for(future, callback_timeout)
--> 332             result[0] = yield future
    333         except Exception as exc:
    334             error[0] = sys.exc_info()

~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/tornado/gen.py in run(self)
    733 
    734                     try:
--> 735                         value = future.result()
    736                     except Exception:
    737                         exc_info = sys.exc_info()

~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in _start(self)
    765             "Hang tight! ",
    766         ):
--> 767             await super()._start()
    768 
    769     @property

~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/distributed/deploy/spec.py in _start(self)
    282 
    283         self.status = "starting"
--> 284         self.scheduler = await self.scheduler
    285         self.scheduler_comm = rpc(
    286             getattr(self.scheduler, "external_address", None) or self.scheduler.address,

~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in _()
    128             async with self.lock:
    129                 if not self.task:
--> 130                     await self.start()
    131                     assert self.task
    132             return self

~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in start(self)
    258             self.public_ip = interface["Association"]["PublicIp"]
    259         self.private_ip = interface["PrivateIpAddresses"][0]["PrivateIpAddress"]
--> 260         await self._set_address_from_logs()
    261         self.status = "running"
    262 

~/SageMaker/my_envs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py in _set_address_from_logs(self)
    181             else:
    182                 if not await self._task_is_running():
--> 183                     raise RuntimeError("%s exited unexpectedly!" % type(self).__name__)
    184                 continue
    185             break

RuntimeError: Scheduler exited unexpectedly!

Here’s my environment

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
_libgcc_mutex=0.1=conda_forge
_openmp_mutex=4.5=1_llvm
aiobotocore=0.12.0=py_0
aiohttp=3.6.2=py36h516909a_0
aioitertools=0.5.1=py_0
appdirs=1.4.3=py_1
asciitree=0.3.3=py_2
async-timeout=3.0.1=py_1000
attrs=19.3.0=py_0
backcall=0.1.0=py_0
bleach=3.1.4=pyh9f0ad1d_0
blinker=1.4=py_1
blosc=1.17.1=he1b5a44_0
bokeh=2.0.1=py36h9f0ad1d_0
boost-cpp=1.72.0=h8e57a91_0
boto3=1.12.15=py_0
botocore=1.15.15=py_0
bzip2=1.0.8=h516909a_2
ca-certificates=2020.4.5.1=hecc5488_0
cachetools=3.1.1=py_0
cairo=1.16.0=hcf35c78_1003
cartopy=0.17.0=py36hb2212bf_1013
certifi=2020.4.5.1=py36h9f0ad1d_0
cffi=1.14.0=py36hd463f26_0
cfitsio=3.470=hb60a0a2_2
cftime=1.1.1.2=py36h785e9b2_0
chardet=3.0.4=py36h9f0ad1d_1006
click=7.1.1=pyh8c360ce_0
click-plugins=1.1.1=py_0
cligj=0.5.0=py_0
cloudpickle=1.3.0=py_0
colorcet=2.0.1=py_0
cryptography=2.8=py36h45558ae_2
curl=7.68.0=hf8cf82a_0
cycler=0.10.0=py_2
cython=0.29.16=py36h831f99a_0
cytoolz=0.10.1=py36h516909a_0
dask=2.14.0=py_0
dask-cloudprovider=0.1.1=py_0
dask-core=2.14.0=py_0
datashader=0.10.0=py_0
datashape=0.5.4=py_1
decorator=4.4.2=py_0
defusedxml=0.6.0=py_0
distributed=2.14.0=py36h9f0ad1d_0
docopt=0.6.2=py_1
docutils=0.15.2=py36_0
earthsim=1.0.1=py_0
entrypoints=0.3=py36h9f0ad1d_1001
erddapy=0.5.3=py_0
expat=2.2.9=he1b5a44_2
fasteners=0.14.1=py_3
fiona=1.8.13=py36h900e953_0
fontconfig=2.13.1=h86ecdb6_1001
freetype=2.10.1=he06d7ca_0
freexl=1.0.5=h14c3975_1002
fsspec=0.7.1=py_0
gcsfs=0.6.1=py_0
gdal=3.0.4=py36hd60729c_3
geopandas=0.7.0=py_1
geos=3.8.1=he1b5a44_0
geotiff=1.5.1=hcbe54f9_9
geoviews=1.8.1=py_0
geoviews-core=1.8.1=py_0
gettext=0.19.8.1=hc5be6a0_1002
giflib=5.2.1=h516909a_2
glib=2.58.3=py36hd3ed26a_1003
google-auth=1.12.0=pyh9f0ad1d_0
google-auth-oauthlib=0.4.1=py_2
h5netcdf=0.8.0=py_0
h5py=2.10.0=nompi_py36h513d04c_102
hdf4=4.2.13=hf30be14_1003
hdf5=1.10.5=nompi_h3c11f04_1104
heapdict=1.0.1=py_0
holoviews=1.13.2=pyh9f0ad1d_0
hvplot=0.5.2=py_0
icu=64.2=he1b5a44_1
idna=2.9=py_1
idna_ssl=1.1.0=py36_1000
imageio=2.8.0=py_0
importlib-metadata=1.6.0=py36h9f0ad1d_0
importlib_metadata=1.6.0=0
ipykernel=5.2.0=py36h95af2a2_1
ipython=7.13.0=py36h9f0ad1d_2
ipython_genutils=0.2.0=py_1
ipywidgets=7.5.1=py_0
jedi=0.16.0=py36h9f0ad1d_1
jinja2=2.11.1=py_0
jmespath=0.9.5=py_0
jpeg=9c=h14c3975_1001
json-c=0.13.1=h14c3975_1001
jsonschema=3.2.0=py36h9f0ad1d_1
jupyter_client=6.1.2=py_0
jupyter_core=4.6.3=py36h9f0ad1d_1
kealib=1.4.13=hec59c27_0
kiwisolver=1.2.0=py36hdb11119_0
krb5=1.16.4=h2fd8d38_0
ld_impl_linux-64=2.34=h53a641e_0
libblas=3.8.0=16_openblas
libcblas=3.8.0=16_openblas
libcurl=7.68.0=hda55be3_0
libdap4=3.20.4=hd3bb157_0
libedit=3.1.20170329=hf8c457e_1001
libffi=3.2.1=he1b5a44_1007
libgcc-ng=9.2.0=h24d8f2e_2
libgdal=3.0.4=h94bbfbd_3
libgfortran-ng=7.3.0=hdf63c60_5
libiconv=1.15=h516909a_1006
libkml=1.3.0=hb574062_1011
liblapack=3.8.0=16_openblas
libllvm8=8.0.1=hc9558a2_0
libnetcdf=4.7.4=nompi_h9f9fd6a_101
libopenblas=0.3.9=h5ec1e0e_0
libpng=1.6.37=hed695b0_1
libpq=12.2=hae5116b_0
libsodium=1.0.17=h516909a_0
libspatialindex=1.9.3=he1b5a44_3
libspatialite=4.3.0a=heb269f5_1037
libssh2=1.8.2=h22169c7_2
libstdcxx-ng=9.2.0=hdf63c60_2
libtiff=4.1.0=hc3755c2_3
libuuid=2.32.1=h14c3975_1000
libwebp=1.0.2=h56121f0_5
libxcb=1.13=h14c3975_1002
libxml2=2.9.10=hee79883_0
llvm-openmp=9.0.1=hc9558a2_2
llvmlite=0.31.0=py36hfa65bc7_1
locket=0.2.0=py_2
lz4=3.0.2=py36h964dcb7_1
lz4-c=1.8.3=he1b5a44_1001
markdown=3.2.1=py_0
markupsafe=1.1.1=py36h8c4c3a4_1
matplotlib-base=3.2.1=py36hb8e4980_0
metpy=0.12.0=py_0
mistune=0.8.4=py36h516909a_1000
monotonic=1.5=py_0
msgpack-python=1.0.0=py36hdb11119_1
multidict=4.7.5=py36h516909a_0
multipledispatch=0.6.0=py_0
munch=2.5.0=py_0
nbconvert=5.6.1=py36_0
nbformat=5.0.4=py_0
ncurses=6.1=hf484d3e_1002
netcdf4=1.5.3=nompi_py36h90ce072_103
networkx=2.4=py_1
notebook=6.0.3=py36_0
numba=0.48.0=py36hb3f55d8_0
numcodecs=0.6.4=py36he1b5a44_0
numpy=1.18.1=py36h7314795_1
oauthlib=3.0.1=py_0
olefile=0.46=py_0
openjpeg=2.3.1=h981e76c_3
openssl=1.1.1f=h516909a_0
owslib=0.19.2=py_1
packaging=20.1=py_0
pandas=1.0.3=py36h830a2c2_0
pandoc=2.9.2=0
pandocfilters=1.4.2=py_1
panel=0.9.5=py_0
param=1.9.3=py_0
parso=0.6.2=py_0
partd=1.1.0=py_0
pcre=8.44=he1b5a44_0
pexpect=4.8.0=py36h9f0ad1d_1
pickleshare=0.7.5=py36h9f0ad1d_1001
pillow=7.1.1=py36h8328e55_0
pint=0.11=py_1
pip=20.0.2=py_2
pixman=0.38.0=h516909a_1003
pooch=1.0.0=py_0
poppler=0.67.0=h14e79db_8
poppler-data=0.4.9=1
postgresql=12.2=hf1211e9_0
proj=6.3.1=hc80f0dc_1
prometheus_client=0.7.1=py_0
prompt-toolkit=3.0.5=py_0
psutil=5.7.0=py36h8c4c3a4_1
pthread-stubs=0.4=h14c3975_1001
ptyprocess=0.6.0=py_1001
pyasn1=0.4.8=py_0
pyasn1-modules=0.2.7=py_0
pycparser=2.20=py_0
pyct=0.4.6=py_0
pyct-core=0.4.6=py_0
pyepsg=0.4.0=py_0
pygments=2.6.1=py_0
pyjwt=1.7.1=py_0
pykdtree=1.3.1=py36h785e9b2_1003
pyopenssl=19.1.0=py_1
pyparsing=2.4.7=pyh9f0ad1d_0
pyproj=2.6.0=py36h47ab0c1_0
pyrsistent=0.16.0=py36h8c4c3a4_0
pyshp=2.1.0=py_0
pysocks=1.7.1=py36h9f0ad1d_1
python=3.6.10=h8356626_1010_cpython
python-blosc=1.9.0=py36h830a2c2_0
python-dateutil=2.8.1=py_0
python-gist=0.9.2=pyh9f0ad1d_1
python-gnupg=0.4.5=py_0
python_abi=3.6=1_cp36m
pytz=2019.3=py_0
pyviz_comms=0.7.4=pyh8c360ce_0
pywavelets=1.1.1=py36hc1659b7_0
pyyaml=5.3.1=py36h8c4c3a4_0
pyzmq=19.0.0=py36h9947dbf_1
readline=8.0=hf8c457e_0
requests=2.23.0=pyh8c360ce_2
requests-oauthlib=1.2.0=py_0
rsa=4.0=py_0
rtree=0.9.4=py36he053a7a_1
s3fs=0.4.2=py_0
s3transfer=0.3.3=py36h9f0ad1d_1
scikit-image=0.16.2=py36hb3f55d8_0
scipy=1.4.1=py36h2d22cac_2
send2trash=1.5.0=py_0
setuptools=46.1.3=py36h9f0ad1d_0
shapely=1.7.0=py36h3d6ee9d_3
simplejson=3.17.0=py36h516909a_0
six=1.14.0=py_1
sortedcontainers=2.1.0=py_0
sqlite=3.30.1=hcee41ef_0
stglib=0.2.0=py_0
tbb=2018.0.5=h2d50403_0
tblib=1.6.0=py_0
terminado=0.8.3=py36h9f0ad1d_1
testpath=0.4.4=py_0
tiledb=1.7.0=hcde45ca_2
tk=8.6.10=hed695b0_0
toolz=0.10.0=py_0
tornado=6.0.4=py36h8c4c3a4_1
tqdm=4.45.0=pyh9f0ad1d_0
traitlets=4.3.3=py36h9f0ad1d_1
typing_extensions=3.7.4.1=py36h9f0ad1d_3
tzcode=2019a=h516909a_1002
urllib3=1.25.7=py36h9f0ad1d_1
utide=0.2.5=py_0
wcwidth=0.1.9=pyh9f0ad1d_0
webencodings=0.5.1=py_1
wheel=0.34.2=py_1
widgetsnbextension=3.5.1=py36_0
wrapt=1.12.1=py36h8c4c3a4_1
xarray=0.15.1=py_0
xerces-c=3.2.2=h8412b87_1004
xmltodict=0.12.0=py_0
xorg-kbproto=1.0.7=h14c3975_1002
xorg-libice=1.0.10=h516909a_0
xorg-libsm=1.2.3=h84519dc_1000
xorg-libx11=1.6.9=h516909a_0
xorg-libxau=1.0.9=h14c3975_0
xorg-libxdmcp=1.1.3=h516909a_0
xorg-libxext=1.3.4=h516909a_0
xorg-libxrender=0.9.10=h516909a_1002
xorg-renderproto=0.11.1=h14c3975_1002
xorg-xextproto=7.3.0=h14c3975_1002
xorg-xproto=7.0.31=h14c3975_1007
xz=5.2.4=h516909a_1002
yaml=0.2.2=h516909a_1
yarl=1.3.0=py36h516909a_1000
zarr=2.4.0=py_0
zeromq=4.3.2=he1b5a44_2
zict=2.0.0=py_0
zipp=3.1.0=py_0
zlib=1.2.11=h516909a_1006
zstd=1.4.4=h3b9ef0a_2

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
rsignell-usgscommented, Apr 15, 2020

Grrr… It turns out the reason my docker image wasn’t working was because I didn’t make the prepare.sh script referenced in the Dockerfile executable. I found this out when I tried (and failed) to login to the container to explore the conda environment. 😞

It now works nicely: https://nbviewer.jupyter.org/gist/rsignell-usgs/b23a082ab85f84339ad08a1a146c056d

1reaction
mrocklincommented, Apr 15, 2020

Woot! Sorry that you had a frustrating experience, but I’m glad to hear that this was not Dask’s fault 😃

On Wed, Apr 15, 2020 at 10:34 AM Rich Signell notifications@github.com wrote:

Closed #79 https://github.com/dask/dask-cloudprovider/issues/79.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-cloudprovider/issues/79#event-3236554700, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTAS2B3VM3QZ56KZANDRMXVZJANCNFSM4MDDLFEQ .

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why did my worker die? - Dask.distributed
Workers may exit in normal functioning because they have been asked to, e.g., they received a keyboard interrupt (^C), or the scheduler scaled...
Read more >
metaflow_org/community - Gitter
I find myself having trouble running a flow on AWS Batch that uses a container with pre-installed Python libraries. I happen to be...
Read more >
CHANGELOG.md · 15-5-stable · GitLab.org / gitlab-runner
New features · Improve documentation about installing and using Podman as a Docker executor replacement ! · Add support SELinux type label setting ......
Read more >
@aws-cdk/assert | Yarn - Package Manager
StackProps) { super(scope, id, props); const queue = new sqs. ... based on heuristics, which caused some unexpected behavior in certain scenarios.
Read more >
awsecs - Go Packages
Only used if the ASG has scheduled actions (which may scale your ASG up // or ... the container is forcefully killed if...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found