multiprocessing issues on Windows
See original GitHub issueDescribe the bug
DataLoader calls fail on Windows when num_workers > 0. Windows causes certain multiprocessing
workflows to fail due to pickling issues.
To Reproduce Steps to reproduce the behavior:
- Run https://github.com/Project-MONAI/tutorials/blob/master/2d_classification/mednist_tutorial.ipynb on Windows
- Get the following error:
---------------------------------------------------------------------------
BrokenPipeError Traceback (most recent call last)
<ipython-input-11-390aa9a04062> in <module>
10 epoch_loss = 0
11 step = 0
---> 12 for batch_data in train_loader:
13 step += 1
14 inputs, labels = batch_data[0].to(device), batch_data[1].to(device)
c:\src\venv-3.7.9-prebuilt\lib\site-packages\torch\utils\data\dataloader.py in __iter__(self)
350 return self._iterator
351 else:
--> 352 return self._get_iterator()
353
354 @property
c:\src\venv-3.7.9-prebuilt\lib\site-packages\torch\utils\data\dataloader.py in _get_iterator(self)
292 return _SingleProcessDataLoaderIter(self)
293 else:
--> 294 return _MultiProcessingDataLoaderIter(self)
295
296 @property
c:\src\venv-3.7.9-prebuilt\lib\site-packages\torch\utils\data\dataloader.py in __init__(self, loader)
799 # before it starts, and __del__ tries to join but will get:
800 # AssertionError: can only join a started process.
--> 801 w.start()
802 self._index_queues.append(index_queue)
803 self._workers.append(w)
~\.pyenv\pyenv-win\versions\3.7.9\lib\multiprocessing\process.py in start(self)
110 'daemonic processes are not allowed to have children'
111 _cleanup()
--> 112 self._popen = self._Popen(self)
113 self._sentinel = self._popen.sentinel
114 # Avoid a refcycle if the target function holds an indirect
~\.pyenv\pyenv-win\versions\3.7.9\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
--> 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):
~\.pyenv\pyenv-win\versions\3.7.9\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
--> 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):
~\.pyenv\pyenv-win\versions\3.7.9\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
87 try:
88 reduction.dump(prep_data, to_child)
---> 89 reduction.dump(process_obj, to_child)
90 finally:
91 set_spawning_popen(None)
~\.pyenv\pyenv-win\versions\3.7.9\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #
BrokenPipeError: [Errno 32] Broken pipe
Expected behavior num_workers > 0 should work on Windows.
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information; e.g. using sh runtests.sh -v
):
MONAI version: 0.3.0
Python version: 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)]
OS version: Windows (10)
Numpy version: 1.19.3
Pytorch version: 1.7.0
MONAI flags: HAS_EXT = False, USE_COMPILED = False
Optional dependencies: Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION. Nibabel version: 3.2.0 scikit-image version: 0.17.2 Pillow version: 8.0.1 Tensorboard version: 2.4.0 gdown version: 3.12.2 TorchVision version: 0.8.1 ITK version: 5.1.1 tqdm version: 4.51.0
Additional context
I’m not an expert on Python multiprocessing
but it looks like using the pathos.multiprocessing package
, or process Manager
in multiprocessing
, or somehow ensuring all arguments passed to workers are pickable (e.g., no lambda functions).
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
This alternative
(Fast)DataLoader
(completely) fixes the slowness issue for me in Windows: pytorch/pytorch#15849 (comment).Pytorch itself doesn’t use transforms, that pattern is in torchvision, so they wouldn’t necessarily discuss the issue there. The problem is that the transform objects are not picklable so they can’t be sent to the subprocesses. If you use Pytorch’s dataloaders natively without augmentations it can work just fine with multiple processes since they can ensure the objects involved are picklable. One issue MONAI has is that the random state objects are not picklable, but I’m sure there’s other issues that would need to be resolved if possible.