Worker failed to start
See original GitHub issueimport distributed print(distributed.__version__)
1.21.2
import tornado print(tornado.version)
4.5.3
from dask.distributed import Client, LocalCluster client = Client()
tornado.application - ERROR - Multiple exceptions in yield list Traceback (most recent call last): File “C:\Users\brahm\Anaconda3\lib\site-packages\tornado\gen.py”, line 1069, in run yielded = self.gen.send(value) File “C:\Users\brahm\Anaconda3\lib\site-packages\distributed\deploy\local.py”, line 196, in _start_worker raise gen.TimeoutError(“Worker failed to start”) tornado.gen.TimeoutError: Worker failed to start
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File “C:\Users\brahm\Anaconda3\lib\site-packages\tornado\gen.py”, line 828, in callback result_list.append(f.result()) File “C:\Users\brahm\Anaconda3\lib\site-packages\tornado\concurrent.py”, line 238, in result raise_exc_info(self._exc_info) File “<string>”, line 4, in raise_exc_info File “C:\Users\brahm\Anaconda3\lib\site-packages\tornado\gen.py”, line 1069, in run yielded = self.gen.send(value) File “C:\Users\brahm\Anaconda3\lib\site-packages\distributed\deploy\local.py”, line 196, in _start_worker raise gen.TimeoutError(“Worker failed to start”) tornado.gen.TimeoutError: Worker failed to start tornado.application - ERROR - Multiple exceptions in yield list Traceback (most recent call last): File “C:\Users\brahm\Anaconda3\lib\site-packages\tornado\gen.py”, line 1069, in run yielded = self.gen.send(value) File “C:\Users\brahm\Anaconda3\lib\site-packages\distributed\deploy\local.py”, line 196, in _start_worker raise gen.TimeoutError(“Worker failed to start”) tornado.gen.TimeoutError: Worker failed to start
During handling of the above exception, another exception occurred: …
Issue Analytics
- State:
- Created 6 years ago
- Comments:54 (36 by maintainers)
Top GitHub Comments
If I pass
processes=False
, then it works just fine. If I rundesk-scheduler
anddesk-worker
in the command line and connect to it viaClient
in both python2 and python3, it works just fine.I believe this has to do with how processes work on macOS. If a process uses the libdispatch library for asynchronous work, the OS marks it as a multi-threaded process complete with an objective C runtime. A process with an objective C runtime under the hood can NOT be forked (i.e. it crashes)
So my theory is that if a python process uses any threading (implemented under the hood with libdispatch) prior to forking, it will crash.
Starting in python3, you can specify whether to
spawn
a fresh new python process which circumvents the issue orforkserver
(the default case).So to reiterate:
python3 + “spawn” + LocalCluster()=> success python3 + “forkserver” + LocalCluster() => fail python2 + LocalCluster() => fail
python3 + “forkserver” + LocalCluster(processes=False) => success python3 + “spawn” + LocalCluster(processes=False) => success python2 + LocalCluster(processes=False) => success
Given that my workload is cpu bound and running python2, using a thread pool over a process pool won’t give me the speedup that I’m looking for.
On a related note: This issue can be pretty subtle. I first ran into this issue while using the
requests
library. In one of the comments of https://stackoverflow.com/questions/28521535/requests-how-to-disable-bypass-proxy, it says thatrequests
will check to see if the system has configured any proxies, which requires the python process to communicate with cfprefsd which then marks it as a multi-threaded environment. Then if you try to fork the python process, then it will crash.Can I ask you to try the following?
client = Client(processes=False)