question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue with distributed when bokeh is installed

See original GitHub issue

I seem to be having significant bugs with dask.distributed after I install bokeh.

I just installed a fresh miniconda3. After running pip install dask distributed, I can run:

$ python
Python 3.7.0 (default, Jun 28 2018, 13:15:42)
[GCC 7.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
(ins)>>> from dask.distributed import Client
(ins)>>> client = Client()
(ins)>>> client
<Client: scheduler='tcp://127.0.0.1:60321' processes=32 cores=32>
(ins)>>> def inc(x):
(ins)...     return x + 1
(ins)...
(ins)>>> def add(x, y):
(ins)...     return x + y
(ins)...
(ins)>>> a = client.submit(inc, 10)  # calls inc(10) in background thread or process
(ins)>>> b = client.submit(inc, 20)  # calls inc(20) in background thread or process
(ins)>>> a
<Future: status: finished, type: int, key: inc-329d1c0d6d5adf260702a5cc673b66c5>
(ins)>>> a.result()
11
(ins)>>> exit()
tornado.application - ERROR - Exception in Future <Future cancelled> after timeout
Traceback (most recent call last):
  File "/home/ipetrik/miniconda3/lib/python3.7/site-packages/tornado/gen.py", line 970, in error_callback
    future.result()
concurrent.futures._base.CancelledError

(Not sure what that last error is about.)

After this I do pip install bokeh and try again:

Python 3.7.0 (default, Jun 28 2018, 13:15:42)
[GCC 7.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
(ins)>>> from dask.distributed import Client
(ins)>>> client = Client()
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
distributed.nanny - WARNING - Worker process 15692 exited with status 1
distributed.nanny - WARNING - Worker process 15670 exited with status 1
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Restarting worker
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
...
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2065958 max
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 250, in main
    pid = os.fork()
BlockingIOError: [Errno 11] Resource temporarily unavailable
distributed.nanny - WARNING - Worker process 15662 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15648 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15652 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15680 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15656 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15688 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15696 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15700 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15664 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15646 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15674 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15668 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15642 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15682 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15686 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15694 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15698 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15702 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15678 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15672 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15660 was killed by unknown signal
distributed.nanny - WARNING - Worker process 15684 was killed by unknown signal
...
distributed.nanny - ERROR - Failed to restart worker after its process exited
Traceback (most recent call last):
  File "/home/ipetrik/miniconda3/lib/python3.7/site-packages/distributed/nanny.py", line 291, in _on_exit
    yield self.instantiate()
  File "/home/ipetrik/miniconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
    value = future.result()
  File "/home/ipetrik/miniconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/ipetrik/miniconda3/lib/python3.7/site-packages/distributed/nanny.py", line 226, in instantiate
    self.process.start()
  File "/home/ipetrik/miniconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
    value = future.result()
  File "/home/ipetrik/miniconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/ipetrik/miniconda3/lib/python3.7/site-packages/distributed/nanny.py", line 370, in start
    yield self.process.start()
  File "/home/ipetrik/miniconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
    value = future.result()
  File "/home/ipetrik/miniconda3/lib/python3.7/site-packages/distributed/process.py", line 35, in _call_and_set_future
    res = func(*args, **kwargs)
  File "/home/ipetrik/miniconda3/lib/python3.7/site-packages/distributed/process.py", line 184, in _start
    process.start()
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/context.py", line 291, in _Popen
    return Popen(process_obj)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/popen_forkserver.py", line 55, in _launch
    self.pid = forkserver.read_signed(self.sentinel)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 312, in read_signed
    raise EOFError('unexpected EOF')
EOFError: unexpected EOF
Traceback (most recent call last):
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 261, in main
Traceback (most recent call last):
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 261, in main
Traceback (most recent call last):
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 261, in main
    old_handlers)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 297, in _serve_one
    old_handlers)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 297, in _serve_one
    code = spawn._main(child_r)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
Traceback (most recent call last):
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 261, in main
    code = spawn._main(child_r)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    old_handlers)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 297, in _serve_one
    self = reduction.pickle.load(from_parent)
    self = reduction.pickle.load(from_parent)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/synchronize.py", line 111, in __setstate__
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/synchronize.py", line 111, in __setstate__
    code = spawn._main(child_r)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
Traceback (most recent call last):
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 261, in main
    self = reduction.pickle.load(from_parent)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/synchronize.py", line 111, in __setstate__
    old_handlers)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 297, in _serve_one
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
    old_handlers)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 297, in _serve_one
    code = spawn._main(child_r)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    code = spawn._main(child_r)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/synchronize.py", line 111, in __setstate__
    self = reduction.pickle.load(from_parent)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/synchronize.py", line 111, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
Traceback (most recent call last):
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 261, in main
Traceback (most recent call last):
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 261, in main
    old_handlers)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 297, in _serve_one
    old_handlers)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 297, in _serve_one
    code = spawn._main(child_r)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    code = spawn._main(child_r)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/synchronize.py", line 111, in __setstate__
    self = reduction.pickle.load(from_parent)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/synchronize.py", line 111, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
Traceback (most recent call last):
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 261, in main
    old_handlers)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 297, in _serve_one
    code = spawn._main(child_r)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/synchronize.py", line 111, in __setstate__
Traceback (most recent call last):
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 261, in main
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
    old_handlers)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/forkserver.py", line 297, in _serve_one
    code = spawn._main(child_r)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
  File "/home/ipetrik/miniconda3/lib/python3.7/multiprocessing/synchronize.py", line 111, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
...
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
  File "/home/ipetrik/miniconda3/lib/python3.7/site-packages/tornado/gen.py", line 883, in callback
    result_list.append(f.result())
  File "/home/ipetrik/miniconda3/lib/python3.7/site-packages/tornado/gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "/home/ipetrik/miniconda3/lib/python3.7/site-packages/distributed/deploy/local.py", line 217, in _start_worker
    raise gen.TimeoutError("Worker failed to start")
tornado.util.TimeoutError: Worker failed to start

if I try creating a dask-scheduler manually and connecting to it works just fine with bokeh.

Running with processes=False works, but obviously, I need processes in production.

$ python
Python 3.7.0 (default, Jun 28 2018, 13:15:42)
[GCC 7.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
(ins)>>> from dask.distributed import Client
(ins)>>> client = Client(processes=False)
(ins)>>> def inc(x):
(ins)...     return x + 1
(ins)...
(ins)>>> def add(x, y):
(ins)...     return x + y
(ins)...
(ins)>>> a = client.submit(inc, 10)  # calls inc(10) in background thread or process
(ins)>>> a
<Future: status: finished, type: int, key: inc-329d1c0d6d5adf260702a5cc673b66c5>
(ins)>>> a.result()
11
(ins)>>> exit()

The environment is the head node of an HPC running CentOS Linux version 2.6.32-358.11.1.el6.x86_64 (mockbuild@c6b7.bsys.dev.centos.org). I do not have root access.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
jurajmichalak1commented, Dec 28, 2018

@IPetrik if you are going to do parallel processing with workers you usually want to limit all such BLAS and OMP threads. I’m doing following in my worker processes:

    cv2.setNumThreads(0)
    os.environ["OMP_NUM_THREADS"] = "1"
    os.environ["OPENBLAS_NUM_THREADS"] = "1"

It will help to use CPU caches more efficiently. When you already use all CPU cores by correctly feeding all your workers with tasks that additional parallelism per each task using for example OpenMP or similar could poison CPU caches (more work for OS scheduler, etc…) Definitely the tasks throughput will be better. But in the case you want minimize latency per one task you should use OPENBLAS and OMP threads + worker processes. So that workers_proc_n * threads_n = CPU_cores_n. You should do measurement to find our limit for threads count until which your single task is scaling almost linearly and when it reaches its top (due to Amdahl’s law) you start using parallel workers for multiple tasks (gustafson’s law). Don’t forget to disable Intel’s turbo boost (which can speed up cores, if not all CPU cores are being used for 100%), because it will badly affect your scaling graph.

Usually your problem is caused by max open files ulimit (which is usually much lower than “max user processes”): Check soft limit of open files allowed:

$ ulimit -Sn
1024

Check hard limit of open files allowed:

$ ulimit -Hn
4096

Or you have to set lower default size of thread stack size.

Lastly systemd doesn’t look at ulimit or sysctl.conf when your process is running as systemd service.

0reactions
jurajmichalak1commented, Jan 11, 2019

@IPetrik Unfortunately there is no such API method at the moment, but you can handle this. Please see this response on how to initialize logging on workers. If you need more details on scattering data check this. I hope this helps if not I can give you code sample.

Read more comments on GitHub >

github_iconTop Results From Across the Web

problem installing Bokeh via pip - Community Support
Hi, I use the most recent version of the Enthought python distribution (Python 2.7) and OS X 10.8.4. I tried to install Bokeh...
Read more >
anaconda - Python Bokeh dependencies not found
I am using python 2.7 and the package that got installed of bokeh is 0.10.0. I followed the. conda install bokeh.
Read more >
Dask Installation - Dask documentation
If you use the Anaconda distribution, Dask installation will occur by default. ... bokeh. >=2.4.2. Visualizing dask diagnostics. cityhash.
Read more >
Python - Setting up the Bokeh Environment - GeeksforGeeks
If you are using Anaconda distribution, use conda package manager as follows ? conda install bokeh. This installs all the dependencies that are ......
Read more >
Setting Up Your Bokeh Environment - Real Python
There's a few different installation options for setting up Bokeh to work on your machine. One is to use a distribution from Anaconda,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found