Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

A training loop got stuck in a certain condition with multi-processing updater and opencv.

See original GitHub issue

A training loop got stuck in a certain condition with multi-processing updater and opencv. The issue does not appear when I use Pillow or serial updater.

Conditions
- Chainer version: 2.0
- CuPy version: 1.0.0.1
- OS/Platform Ubuntu 14.04.5 (for PFN ppl sakura server 1)
- CUDA/cuDNN version: V8.0.44, I don’t know the way to check CUDNN version…
Code to reproduce https://github.com/apple2373/chainer-train-stuck Note that this requires other libraries such as chainercv, open cv, etc… then python train.py --gpu 0 --mode 0
Error messages, stack traces, or logs No message when stuck but the I can get the following message when I abort it.

stsutsui@sakura1:/mnt/sakura201/stsutsui/chainer-train-stuck$ python train.py --gpu 0 --mode 0 epoch iteration elapsed_time main/loss main/accuracy 0 1 7.3559 1.07867 0.498882
^CProcess Process-8:…] 1.00% Process Process-7:###########…] 35.29% Traceback (most recent call last):rations Traceback (most recent call last):me to finish: 0:00:00. File “/mnt/sakura201/stsutsui/anadonda2/lib/python2.7/multiprocessing/process.py”, line 258, in _bootstrap File “/mnt/sakura201/stsutsui/anadonda2/lib/python2.7/multiprocessing/process.py”, line 258, in _bootstrap self.run() File “/mnt/sakura201/stsutsui/anadonda2/lib/python2.7/multiprocessing/process.py”, line 114, in run self._target(*self._args, **self._kwargs) File “/mnt/sakura201/stsutsui/anadonda2/lib/python2.7/site-packages/chainer/iterators/multiprocess_iterator.py”, line 386, in _worker self.run() File “/mnt/sakura201/stsutsui/anadonda2/lib/python2.7/multiprocessing/process.py”, line 114, in run self._target(*self._args, **self._kwargs) File “/mnt/sakura201/stsutsui/anadonda2/lib/python2.7/site-packages/chainer/iterators/multiprocess_iterator.py”, line 386, in _worker cnt, mem_index, index = in_queue.get() cnt, mem_index, index = in_queue.get() File “/mnt/sakura201/stsutsui/anadonda2/lib/python2.7/multiprocessing/queues.py”, line 115, in get File “/mnt/sakura201/stsutsui/anadonda2/lib/python2.7/multiprocessing/queues.py”, line 117, in get res = self._recv() self._rlock.acquire() KeyboardInterrupt KeyboardInterrupt

Issue Analytics

State:
Created 6 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

mitmulcommented, Jan 29, 2018

@apple2373 I reproduced it with this environment setting: https://gist.github.com/mitmul/2cd98788c07ebbae1815232a32f95728 You can trace what I saw in the same environment built with the Dockerfile.

And I also find a workaround. Just put OMP_NUM_THREADS=1 before the execution of Python just solves the problem:

OMP_NUM_THREADS=1 python train.py --gpu 0 --mode 0

This progresses the training without stucking. This is actually related to OpenCV’s imread method. Because if I replace the get_example method of SegDataset with an alternative one just returns a numpy array created inside of the method, it processed without stucking.

Well, another workaround is to set cv.setNumThreads(0) right after the import cv2 as cv in the source code.

0reactions

stale[bot]commented, Jan 30, 2019

This issue is closed as announced. Feel free to re-open it if needed.

Top Results From Across the Web

Python multiprocessing gets stuck - Stack Overflow

My code has a set of differential equations using RungeKutta4 method and I need to run tons of calculations with different starting conditions....

Multiprocessing with OpenCV and Python - PyImageSearch

In this tutorial, you will learn how to use multiprocessing with OpenCV and Python to perform feature extraction.

Multiprocessing - Advanced Python 17

A race condition occurs when two or more processes or threads can access shared data and they try to change it at the...

Tips and FAQs — Chainer 7.8.1 documentation

This problem is originally reported here: A training loop got stuck in a certain condition with multi-processing updater and opencv for Chainer and...

Multi-Threaded Programming - C++ Class Thread for Pthreads ...

This group includes functions to create, destroy, wait and signal based upon specified variable values. Functions to set/query condition variable attributes are ......