question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

VOCDetectionDataModule fails to load

See original GitHub issue

🐛 Bug

I tried to use VOCDetectionDataModule, and it seems that I fails to properly load the dataset because of the problems with multiprocessing. See the below minimal repro.

To Reproduce

Steps to reproduce the behavior:

>>> from pl_bolts.datamodules import VOCDetectionDataModule
>>> datamodule = VOCDetectionDataModule(data_dir=data_dir, num_workers=2)
>>> datamodule.prepare_data()
>>> train_loader = datamodule.train_dataloader(batch_size=1)
>>> next(iter(train_loader))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-22-0fd2476fd2ff> in <module>
----> 1 next(iter(train_loader))

~/anaconda3/envs/lightning-bolts/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __iter__(self)
    350             return self._iterator
    351         else:
--> 352             return self._get_iterator()
    353
    354     @property

~/anaconda3/envs/lightning-bolts/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _get_iterator(self)
    292             return _SingleProcessDataLoaderIter(self)
    293         else:
--> 294             return _MultiProcessingDataLoaderIter(self)
    295
    296     @property

~/anaconda3/envs/lightning-bolts/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __init__(self, loader)
    799             #     before it starts, and __del__ tries to join but will get:
    800             #     AssertionError: can only join a started process.
--> 801             w.start()
    802             self._index_queues.append(index_queue)
    803             self._workers.append(w)

~/anaconda3/envs/lightning-bolts/lib/python3.8/multiprocessing/process.py in start(self)
    119                'daemonic processes are not allowed to have children'
    120         _cleanup()
--> 121         self._popen = self._Popen(self)
    122         self._sentinel = self._popen.sentinel
    123         # Avoid a refcycle if the target function holds an indirect

~/anaconda3/envs/lightning-bolts/lib/python3.8/multiprocessing/context.py in _Popen(process_obj)
    222     @staticmethod
    223     def _Popen(process_obj):
--> 224         return _default_context.get_context().Process._Popen(process_obj)
    225
    226 class DefaultContext(BaseContext):

~/anaconda3/envs/lightning-bolts/lib/python3.8/multiprocessing/context.py in _Popen(process_obj)
    282         def _Popen(process_obj):
    283             from .popen_spawn_posix import Popen
--> 284             return Popen(process_obj)
    285
    286     class ForkServerProcess(process.BaseProcess):

~/anaconda3/envs/lightning-bolts/lib/python3.8/multiprocessing/popen_spawn_posix.py in __init__(self, process_obj)
     30     def __init__(self, process_obj):
     31         self._fds = []
---> 32         super().__init__(process_obj)
     33
     34     def duplicate_for_child(self, fd):

~/anaconda3/envs/lightning-bolts/lib/python3.8/multiprocessing/popen_fork.py in __init__(self, process_obj)
     17         self.returncode = None
     18         self.finalizer = None
---> 19         self._launch(process_obj)
     20
     21     def duplicate_for_child(self, fd):

~/anaconda3/envs/lightning-bolts/lib/python3.8/multiprocessing/popen_spawn_posix.py in _launch(self, process_obj)
     45         try:
     46             reduction.dump(prep_data, fp)
---> 47             reduction.dump(process_obj, fp)
     48         finally:
     49             set_spawning_popen(None)

~/anaconda3/envs/lightning-bolts/lib/python3.8/multiprocessing/reduction.py in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61
     62 #

AttributeError: Can't pickle local object 'VOCDetectionDataModule._default_transforms.<locals>.<lambda>'

If num_workers=0, then the above repro works perfectly fine. It’s when num_workers>0 that the repro fails.

Environment

  • PyTorch Version (e.g., 1.0): 1.7
  • OS (e.g., Linux): macOS
  • How you installed PyTorch (conda, pip, source): pip
  • Build command you used (if compiling from source):
  • Python version: 3.8.6
  • CUDA/cuDNN version: N/A
  • GPU models and configuration: N/A
  • Any other relevant information:
  • PyTorch Lightning: 1.1.2
  • PyTorch Lightning Bolts: 0.2.5

Additional context

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
Bordacommented, Jan 6, 2021

I saw a similar issue yesterday in PL example with Windows, and strange that it was such for a very long time and suddenly appeared yesterday… see: https://stackoverflow.com/a/59680818/4521646

Let’s see, but I tihnk we can avoid lamda I think from line

yes, let’s replace the lambda function :] @briankosw want to take it over?

1reaction
zhiqwangcommented, Dec 26, 2020

If we run in terminal like python -m train ... with num_workers>0, this will also be fine. But if we are working in Jupyter or IPython, it seems that we must set num_workers=0. Maybe this is not a bug?

Read more comments on GitHub >

github_iconTop Results From Across the Web

VOCDetectionDataModule fails to load · Lightning-AI ... - GitHub
Toolbox of models, callbacks, and datasets for AI/ML researchers. - VOCDetectionDataModule fails to load · Lightning-AI/lightning-bolts@a72766c.
Read more >
PyTorch-Lightning-Bolts Documentation
data_dir – where to save/load the data. • val_split – how many of the training images to use for the validation split.
Read more >
PyTorchLightning - Bountysource
fails to download weights - ... FasterRCNN does not load from checkpoint if customized, i.e. when I change the number of classes and...
Read more >
k-9 Search UI needs improvements in landscape mode - Java
... produced with bundletool's --local-testing flag is not installed properly - JavaScript VOCDetectionDataModule fails to load - Python lightning-bolts.
Read more >
lightning-bolts - Github Plus
The old implementation does not support GPU acceleration for KNN ... STL10DataModule; VisionDataModule; VOCDetectionDataModule ... pip install; Run '.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found