question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

A training loop got stuck in a certain condition with multi-processing updater and opencv.

See original GitHub issue

A training loop got stuck in a certain condition with multi-processing updater and opencv. The issue does not appear when I use Pillow or serial updater.

  • Conditions
    • Chainer version: 2.0
    • CuPy version: 1.0.0.1
    • OS/Platform Ubuntu 14.04.5 (for PFN ppl sakura server 1)
    • CUDA/cuDNN version: V8.0.44, I don’t know the way to check CUDNN version…
  • Code to reproduce https://github.com/apple2373/chainer-train-stuck Note that this requires other libraries such as chainercv, open cv, etc… then python train.py --gpu 0 --mode 0
  • Error messages, stack traces, or logs No message when stuck but the I can get the following message when I abort it.

stsutsui@sakura1:/mnt/sakura201/stsutsui/chainer-train-stuck$ python train.py --gpu 0 --mode 0 epoch iteration elapsed_time main/loss main/accuracy 0 1 7.3559 1.07867 0.498882
^CProcess Process-8:…] 1.00% Process Process-7:###########…] 35.29% Traceback (most recent call last):rations Traceback (most recent call last):me to finish: 0:00:00. File “/mnt/sakura201/stsutsui/anadonda2/lib/python2.7/multiprocessing/process.py”, line 258, in _bootstrap File “/mnt/sakura201/stsutsui/anadonda2/lib/python2.7/multiprocessing/process.py”, line 258, in _bootstrap self.run() File “/mnt/sakura201/stsutsui/anadonda2/lib/python2.7/multiprocessing/process.py”, line 114, in run self._target(*self._args, **self._kwargs) File “/mnt/sakura201/stsutsui/anadonda2/lib/python2.7/site-packages/chainer/iterators/multiprocess_iterator.py”, line 386, in _worker self.run() File “/mnt/sakura201/stsutsui/anadonda2/lib/python2.7/multiprocessing/process.py”, line 114, in run self._target(*self._args, **self._kwargs) File “/mnt/sakura201/stsutsui/anadonda2/lib/python2.7/site-packages/chainer/iterators/multiprocess_iterator.py”, line 386, in _worker cnt, mem_index, index = in_queue.get() cnt, mem_index, index = in_queue.get() File “/mnt/sakura201/stsutsui/anadonda2/lib/python2.7/multiprocessing/queues.py”, line 115, in get File “/mnt/sakura201/stsutsui/anadonda2/lib/python2.7/multiprocessing/queues.py”, line 117, in get res = self._recv() self._rlock.acquire() KeyboardInterrupt KeyboardInterrupt

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mitmulcommented, Jan 29, 2018

@apple2373 I reproduced it with this environment setting: https://gist.github.com/mitmul/2cd98788c07ebbae1815232a32f95728 You can trace what I saw in the same environment built with the Dockerfile.

And I also find a workaround. Just put OMP_NUM_THREADS=1 before the execution of Python just solves the problem:

OMP_NUM_THREADS=1 python train.py --gpu 0 --mode 0

This progresses the training without stucking. This is actually related to OpenCV’s imread method. Because if I replace the get_example method of SegDataset with an alternative one just returns a numpy array created inside of the method, it processed without stucking.

Well, another workaround is to set cv.setNumThreads(0) right after the import cv2 as cv in the source code.

0reactions
stale[bot]commented, Jan 30, 2019

This issue is closed as announced. Feel free to re-open it if needed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python multiprocessing gets stuck - Stack Overflow
My code has a set of differential equations using RungeKutta4 method and I need to run tons of calculations with different starting conditions....
Read more >
Multiprocessing with OpenCV and Python - PyImageSearch
In this tutorial, you will learn how to use multiprocessing with OpenCV and Python to perform feature extraction.
Read more >
Multiprocessing - Advanced Python 17
A race condition occurs when two or more processes or threads can access shared data and they try to change it at the...
Read more >
Tips and FAQs — Chainer 7.8.1 documentation
This problem is originally reported here: A training loop got stuck in a certain condition with multi-processing updater and opencv for Chainer and...
Read more >
Multi-Threaded Programming - C++ Class Thread for Pthreads ...
This group includes functions to create, destroy, wait and signal based upon specified variable values. Functions to set/query condition variable attributes are ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found