question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scheduler hangs randomly

See original GitHub issue

Hey,

I’m facing problems using parallel_backend. For some reason, the scheduler hangs randomly after few calls. I’m calling parallel_backend as follows:

with parallel_backend('dask.distributed', scheduler_host='127.0.0.1:8786'): ret = Parallel()(fd(X, s, **kwds) for s in _gen_even_slices(len(X), n_jobs))

I have 2 8-core servers where I start dask-worker as follows: dask-worker 10.1.0.4:8786 --nprocs 7 --nthreads 1 --no-bokeh They are idle at the moment the scheduler starts hanging. When I cancel the python call after it hangs, the following stacktrace is printed:

Traceback (most recent call last): File “/home/christian/.conda/envs/nedtrain35/lib/python3.5/site-packages/joblib/parallel.py”, line 684, in retrieve ^C File “/home/christian/.conda/envs/nedtrain35/lib/python3.5/site-packages/distributed/client.py”, line 110, in result result = sync(self.client.loop, self._result, raiseit=False) File “/home/christian/.conda/envs/nedtrain35/lib/python3.5/site-packages/distributed/utils.py”, line 161, in sync e.wait(1000000) File “/home/christian/.conda/envs/nedtrain35/lib/python3.5/threading.py”, line 549, in wait signaled = self._cond.wait(timeout) File “/home/christian/.conda/envs/nedtrain35/lib/python3.5/threading.py”, line 297, in wait gotit = waiter.acquire(True, timeout) KeyboardInterrupt

The following versions are installed:

  • dask 0.13.0 py35_0 conda-forge
  • distributed 1.15.0 py35_0 conda-forge
  • tornado 4.4.1 py35_0

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
csdenboercommented, Jan 10, 2017

It’s pretty hard to come up with a small reproducible example. Is there somewhere a log where I can see why a worker failed?

0reactions
gouthambscommented, May 30, 2017

@mrocklin Good to know that you are able to replicate on your end. This is one example of the error I have encountered. I have come across similar issues where the worker/scheduler hangs because of some error in the code. Hopefully the fix will be generic enough to catch other such issues.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Scheduled Task stops randomly - TechNet - Microsoft
We're going to tell the task scheduler to run cmd.exe and pass it some parameters. The /c switch tells cmd to execute the...
Read more >
4 Ways to Fix the Windows 10 Task Scheduler When It ...
1. Fix the Task Scheduler Using the Registry Editor · To get started, press the Windows Key + R, type regedit, and press...
Read more >
How to Fix Task Scheduler Not Working in Windows
Task Scheduler stops randomly: tasks are executing properly, but Task Scheduler stops before everything is done. System freezes: your system ...
Read more >
Spring Boot scheduler thread stops randomly - Stack Overflow
I need to restart the program for the scheduler to work again. Sometimes the task of the scheduler goes wrong, and I throw...
Read more >
Task Scheduler not running, triggering or starting programs
If Task Scheduler stops automatically, failed to start task for user, is not running a script or batch file, does not start a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found