question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Worker failed to start

See original GitHub issue

import distributed print(distributed.__version__) 1.21.2

import tornado print(tornado.version) 4.5.3

from dask.distributed import Client, LocalCluster client = Client()

tornado.application - ERROR - Multiple exceptions in yield list Traceback (most recent call last): File “C:\Users\brahm\Anaconda3\lib\site-packages\tornado\gen.py”, line 1069, in run yielded = self.gen.send(value) File “C:\Users\brahm\Anaconda3\lib\site-packages\distributed\deploy\local.py”, line 196, in _start_worker raise gen.TimeoutError(“Worker failed to start”) tornado.gen.TimeoutError: Worker failed to start

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “C:\Users\brahm\Anaconda3\lib\site-packages\tornado\gen.py”, line 828, in callback result_list.append(f.result()) File “C:\Users\brahm\Anaconda3\lib\site-packages\tornado\concurrent.py”, line 238, in result raise_exc_info(self._exc_info) File “<string>”, line 4, in raise_exc_info File “C:\Users\brahm\Anaconda3\lib\site-packages\tornado\gen.py”, line 1069, in run yielded = self.gen.send(value) File “C:\Users\brahm\Anaconda3\lib\site-packages\distributed\deploy\local.py”, line 196, in _start_worker raise gen.TimeoutError(“Worker failed to start”) tornado.gen.TimeoutError: Worker failed to start tornado.application - ERROR - Multiple exceptions in yield list Traceback (most recent call last): File “C:\Users\brahm\Anaconda3\lib\site-packages\tornado\gen.py”, line 1069, in run yielded = self.gen.send(value) File “C:\Users\brahm\Anaconda3\lib\site-packages\distributed\deploy\local.py”, line 196, in _start_worker raise gen.TimeoutError(“Worker failed to start”) tornado.gen.TimeoutError: Worker failed to start

During handling of the above exception, another exception occurred: … image

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:54 (36 by maintainers)

github_iconTop GitHub Comments

3reactions
allentsouhuangcommented, Nov 15, 2018

If I pass processes=False, then it works just fine. If I run desk-scheduler and desk-worker in the command line and connect to it via Client in both python2 and python3, it works just fine.

I believe this has to do with how processes work on macOS. If a process uses the libdispatch library for asynchronous work, the OS marks it as a multi-threaded process complete with an objective C runtime. A process with an objective C runtime under the hood can NOT be forked (i.e. it crashes)

So my theory is that if a python process uses any threading (implemented under the hood with libdispatch) prior to forking, it will crash.

Starting in python3, you can specify whether to spawn a fresh new python process which circumvents the issue or forkserver (the default case).

So to reiterate:

python3 + “spawn” + LocalCluster()=> success python3 + “forkserver” + LocalCluster() => fail python2 + LocalCluster() => fail

python3 + “forkserver” + LocalCluster(processes=False) => success python3 + “spawn” + LocalCluster(processes=False) => success python2 + LocalCluster(processes=False) => success

Given that my workload is cpu bound and running python2, using a thread pool over a process pool won’t give me the speedup that I’m looking for.

On a related note: This issue can be pretty subtle. I first ran into this issue while using the requests library. In one of the comments of https://stackoverflow.com/questions/28521535/requests-how-to-disable-bypass-proxy, it says that requests will check to see if the system has configured any proxies, which requires the python process to communicate with cfprefsd which then marks it as a multi-threaded environment. Then if you try to fork the python process, then it will crash.

3reactions
mrocklincommented, Mar 9, 2018

Can I ask you to try the following?

  1. Avoid processes with client = Client(processes=False)
  2. See if you can create things locally with the command line? http://dask.pydata.org/en/latest/setup/cli.html
Read more comments on GitHub >

github_iconTop Results From Across the Web

python 3.x - TimeoutError: Worker failed to start - Stack Overflow
The problem was solved by updating packages dask,distributed,tornado to version respectively 2.4.0 , 2.4.0, 6.0.3.
Read more >
Why does my Dask application fail to start a worker process?
A common cause of this error is that the worker requires more memory. To increase the worker memory for your application, complete one...
Read more >
Creo View Adapters: Creo Worker fails to start - PTC
Worker agent start up fails at the following message in the Worker log: Starting ProE : Startup of application "proe2pv" failed. Creo Parametric ......
Read more >
unable to start or restart AtoM worker - Google Groups
You may want to check the worker log (notes just above the tip) to know what is going/went wrong. Best regards,. Radda.
Read more >
Source code for distributed.worker - Dask documentation
__name__}-fail-hard" if iscoroutinefunction(method): @functools.wraps(method) ... You can start a worker with the ``dask worker`` command line application:: ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found